You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@tvm.apache.org by GitBox <gi...@apache.org> on 2021/06/08 20:26:05 UTC

[GitHub] [tvm-rfcs] junrushao1994 opened a new pull request #5: [RFC] Meta Schedule (AutoTensorIR)

junrushao1994 opened a new pull request #5:
URL: https://github.com/apache/tvm-rfcs/pull/5


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tvm-rfcs] junrushao1994 commented on a change in pull request #5: [RFC] Meta Schedule (AutoTIR)

Posted by GitBox <gi...@apache.org>.
junrushao1994 commented on a change in pull request #5:
URL: https://github.com/apache/tvm-rfcs/pull/5#discussion_r676834667



##########
File path: rfcs/0001-meta-schedule-autotensorir.md
##########
@@ -147,17 +203,20 @@ sch.reorder(
 
 ### 3.4. AutoScheduler-style Design Space Generation
 
-AutoScheduler (Ansor) generates schedule templates by applying their SearchRules to each stage.
-SearchRule analyzes TE and eagerly trigger schedule primitives accordingly in its internally
-maintained mini IR.
+To generate design space, AutoScheduler (Ansor) applies a set of rules to each TE stage.
+The rules analyze the TE operations and apply an internal DSL to manipulating its internal IR,

Review comment:
       Both are internal




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tvm-rfcs] comaniac commented on a change in pull request #5: [RFC] Meta Schedule (AutoTIR)

Posted by GitBox <gi...@apache.org>.
comaniac commented on a change in pull request #5:
URL: https://github.com/apache/tvm-rfcs/pull/5#discussion_r664920614



##########
File path: rfcs/0001-meta-schedule-autotensorir.md
##########
@@ -0,0 +1,440 @@
+<!--- Licensed to the Apache Software Foundation (ASF) under one -->
+<!--- or more contributor license agreements.  See the NOTICE file -->
+<!--- distributed with this work for additional information -->
+<!--- regarding copyright ownership.  The ASF licenses this file -->
+<!--- to you under the Apache License, Version 2.0 (the -->
+<!--- "License"); you may not use this file except in compliance -->
+<!--- with the License.  You may obtain a copy of the License at -->
+
+<!---   http://www.apache.org/licenses/LICENSE-2.0 -->
+
+<!--- Unless required by applicable law or agreed to in writing, -->
+<!--- software distributed under the License is distributed on an -->
+<!--- "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY -->
+<!--- KIND, either express or implied.  See the License for the -->
+<!--- specific language governing permissions and limitations -->
+<!--- under the License. -->
+
+* Feature Name: Meta Schedule (Formerly AutoTIR)
+* Start Date: 2021-05-28
+* RFC PR: TBD (apache/tvm-rfcs#0000)
+* GitHub Issue: TBD (apache/tvm-rfcs#0000)
+
+## 1. Summary
+
+This proposal introduces Meta Schedule: a probabilistic scheduling DSL on TIR that unifies the
+approaches of AutoTVM and Auto Scheduler (Ansor). Meta schedule provides a pragmatic way to define
+the space of automatic tuning, extensibility in terms of all possible TIR schedule primitives like
+tensorization and loop partitioning, and customizability on every layer of the automation system.
+
+Meta Schedule is our 3rd generation automatic scheduling system.
+
+## 2. Motivation
+
+**Scheduling and design space.** In TVM TensorIR, optimization of a TensorIR program is done via a
+sequence of transformations. For example, we reorder loops for better locality and we tensorize for
+specific hardware intrinsics. The process of invoking such a set of pre-defined transformations is
+called "**scheduling**", and each transformation is called a "**schedule primitive**". These
+primitives form a domain-specific language (DSL) describing the transformation of TensorIR programs.
+**Design space** is the set of all possible schedulings with respect to a TensorIR program.
+
+**Problems with the current scheduling system.** Currently we have 3 sets of scheduling APIs:
+* **Manual schedule**: Developers optimize their programs by manually invoking schedule primitives,
+  i.e. explore points in the design space with humans in the loop. This can be a tedious and
+  error-prone approach, hence the creation of AutoTVM and AutoScheduler (Ansor).
+* **AutoTVM**: The automation system requires users to define "schedule templates" as the design
+  space for each operator. Therefore, it is inextensible to hundreds of operators and variety
+  hardware platforms.
+* **AutoScheduler (Ansor)**: It automatically generates schedule templates as the design space,
+  according to a set of predefined "search rules". However, it is non-trivial to extend
+  AutoScheduler to new schedule primitives (tensorize, loop partition, software pipelining, etc).
+* The three systems above have isolated sets of APIs with several layers of their own abstraction,
+  which are not only hard to learn, but also engineering-intensive to customize.
+
+**Benefits of Meta Schedule.**  Meta schedule provides:
+* Succinct syntax, consistent APIs to TensorIR schedule with no other layer of abstraction.
+* Unified APIs for implementing manual schedules, AutoTVM-style schedules, and AutoScheduler-style
+  schedules.
+* Extensibility of all the schedule primitives, including tensorization and loop partitioning.
+  Almost no extra effort is needed to use a new primitive in auto-tuning.
+* The automation infrastructure is extensible on every of its components. Every component of the
+  system can be customized easily in pure python or C++ or both. For example, one can develop a new
+  design space generator in python, a new ProgramRunner in python, etc.
+
+
+## 3. Guide-level explanation
+
+In this section, we describe the syntax of meta schedule DSL, and how it could be used to describe
+and auto-generate the design space.
+
+### 3.1. Manual Schedule
+
+Meta schedule APIs are almost the same as TE or TensorIR scheduling. Here is an example of a manual
+schedule for matrix multiplication:
+
+```python
+# Designate a set of tile sizes
+i_tiles = [16, 8, 8, 8]
+j_tiles = [16, 8, 8, 8]
+k_tiles = [256, 8]
+
+# Tile the loops according to the tile sizes
+i_0, i_1, i_2, i_3 = sch.split(loop=i, factors=i_tiles)
+j_0, j_1, j_2, j_3 = sch.split(loop=j, factors=j_tiles)
+k_0, k_1           = sch.split(loop=k, factors=k_tiles)
+
+# Organize the loops into "SSRSRS" 6-level tiles
+sch.reorder(
+    i_0, j_0, # S: the 1st spatial tile
+    i_1, j_1, # S: the 2nd spatial tile
+    k_0,      # R: the 1st reduction tile
+    i_2, j_2, # S: the 3rd spatial tile
+    k_1,      # R: the 2nd reduction tile
+    i_3, j_3, # S: the 4th spatial tile
+)
+```
+
+In this example, the developers may tweak the tile sizes and measure the performance of the
+generated kernels to explore the opportunities of potential optimization.
+
+Generally speaking, while writing a schedule, there are often some parameters that are hard to
+determine ahead of time, for example, tile sizes, unroll steps, or which tensor intrinsics to use.
+Developers may manually enumerate possible combinations of these unknown factors, and then pick the
+best schedule according to measurement results on their device.
+
+### 3.2. Composite Schedule Rules
+
+As introduced in the previous section, in TensorIR, each schedule primitive handles only a very
+basic transformation of the IR. For example, `split` only splits a loop into two new loops. In the
+real world, the over-fine granularity of those primitives usually leads to repetitive and verbose
+scheduling code, as
+[mentioned](https://discuss.tvm.apache.org/t/rfc-tensorir-a-schedulable-ir-for-tvm/7872/43?u=junrushao1994)
+by developers in our community.
+
+To make it more convenient and modular, we allow users to register "composite schedules" that apply
+a sequence of schedule primitives according to certain analysis of the IR. For instance, a composite
+schedule may inspect a TensorIR block and decide whether we should call `compute_inline` on it.
+
+### 3.3. AutoTVM-style Design Space Description
+
+Meta schedule extends the schedule DSL with sampling instructions. When included in a schedule,
+these instructions parametrize the schedule from a single deterministic point to a space supported
+by random variables (tile size, etc.), making it possible for developers to describe the design
+space with meta schedule APIs.
+
+We can extend the matmul example above to cover all possible tilings using these sampling
+instructions:
+
+```python
+# Sample tile sizes
+i_tiles = sch.sample_perfect_tile(i, n=4)
+j_tiles = sch.sample_perfect_tile(j, n=4)
+k_tiles = sch.sample_perfect_tile(k, n=2)
+# Tile the loops according to the random variables
+i_0, i_1, i_2, i_3 = sch.split(loop=i, factors=i_tiles)
+j_0, j_1, j_2, j_3 = sch.split(loop=j, factors=j_tiles)
+k_0, k_1           = sch.split(loop=k, factors=k_tiles)
+# Organize the loops into "SSRSRS" 6-level tiles
+sch.reorder(
+    i_0, j_0, # S: the 1st spatial tile
+    i_1, j_1, # S: the 2nd spatial tile
+    k_0,      # R: the 1st reduction tile
+    i_2, j_2, # S: the 3rd spatial tile
+    k_1,      # R: the 2nd reduction tile
+    i_3, j_3, # S: the 4th spatial tile
+)
+```
+
+### 3.4. AutoScheduler-style Design Space Generation
+
+AutoScheduler (Ansor) generates schedule templates by applying their SearchRules to each stage.
+SearchRule analyzes TE and eagerly trigger schedule primitives accordingly in its internally
+maintained mini IR.
+
+As introduced in Section 3.2, composite schedule rules are equivalent to AutoScheduler's SearchRule
+in TensorIR scheduling. To further generate a design space for scheduling, sampling instructions are
+used in composite schedule rules. Similarly, the sketch generation phase in AutoScheduler is
+equivalent to applying composite schedule rules to each block in TensorIR.
+
+Several built-in composite schedule rules are shipped with our system to align with Ansor's design
+space:
+
+* Multi-level tiling
+* Inline pure spatial blocks
+* Parallelize & vectorize & unroll
+* Auto tensorize
+
+Developers may implement their own rules in either Python or C++. They may specify which rules to
+use with the following syntax:
+
+```python
+from tvm import meta_schedule as ms
+
+design_space_generator = ms.PostOrderApply(rules=[
+    ms.MultiLevelTiling(...),
+    CustomRule(...),
+    ms.OtherBuiltinRules(...),
+])
+
+```
+
+### 3.5. Unifying manual schedule / AutoTVM / Ansor
+
+In this section, we show that the design space induced by TE manual schedule, AutoTVM and Ansor are
+all subsets of meta schedule, and meta schedule further allows mixing those three styles to search
+jointly.
+
+**Manual schedule**. The TE schedule is a special case of a meta schedule program, where there is no
+randomness introduced by sampling instructions. It is a single point in terms of design space.
+
+**AutoTVM (Template-based tuning)**. It is more natural representation of AutoTVM’s schedule
+templates (knobs) by writing one or more schedule functions in meta schedule with sampling
+instructions. The execution traces generated by the schedule functions are the design space to be
+explored.
+
+**AutoScheduler (Ansor, Template-free tuning)**. As mentioned in the previous section, application
+of composite schedule rules generates the design space, which is equivalent to Ansor’s sketch
+generation.
+
+**Mixing styles in design space definition**. By taking union of the spaces induced by the three
+special cases, our system allows developers to combine generic rules that Ansor provides and
+operator-specific scheduling.
+
+## 4. Reference-level explanation
+
+In this section, we introduce the underlying techniques for the automation system to extract and
+explore the design space. The figure below briefly illustrates the workflow of the system:
+
+![meta-schedule-workflow](../resources/meta-schedule-workflow.png)
+
+### 4.1. Execution trace as the design space
+
+**Trace**. To represent the design space defined by the meta schedule DSL, the underlying system
+records all the instructions users applied to the schedule class, including sampling and schedule
+primitives. We call this list of instructions a trace.
+
+For instance, executing the example above results in the following trace:
+
+```
+Instruction 0. Sample-Perfect-Tile
+Instruction 1. Sample-Perfect-Tile
+Instruction 2. Sample-Perfect-Tile
+Instruction 3. Split
+Instruction 4. Split
+Instruction 5. Split
+Instruction 6. Reorder
+```
+
+**Trace forms design space.** A trace may contain zero or more sampling instructions, which
+introduces the uncertainty in scheduling - one instance of sampling results in one point in the
+design space. Therefore, the trace itself forms a design space to explore, e.g. which set of tile
+sizes works best on a specific hardware.
+
+**Union of design space**. Our system works on a set of traces, representing the union of the design
+spaces represented by every single trace.
+
+**Fork a trace**. When two different decisions in the scheduling process are equally important to
+generate high-performance schedules, we allow forking the trace into two, and the design space is
+the union of the forked traces.
+
+The trace is not strictly user-facing, but can be accessed and printed with the following syntax:
+
+```python
+# requires to trace the execution
+sch = tir.Schedule(..., traced=True)
+# do a lot of scheduling
+...
+# print the trace
+print(sch.trace)
+```
+
+And below is an example of the printed trace, which honestly reflects the schedule as a snippet of
+python scheduling function:
+
+```python
+b0 = sch.get_block(name="matmul", func_name="main")
+l1, l2, l3 = sch.get_loops(block=b0)
+v4, v5, v6, v7 = sch.sample_perfect_tile(loop=l1, n=4, max_innermost_factor=16, decision=[32, 1, 16, 2])
+v8, v9, v10, v11 = sch.sample_perfect_tile(loop=l2, n=4, max_innermost_factor=16, decision=[64, 4, 2, 2])
+v12, v13 = sch.sample_perfect_tile(loop=l3, n=2, max_innermost_factor=16, decision=[64, 16])
+l14, l15, l16, l17 = sch.split(loop=l1, factors=[v4, v5, v6, v7])
+l18, l19, l20, l21 = sch.split(loop=l2, factors=[v8, v9, v10, v11])
+l22, l23 = sch.split(loop=l3, factors=[v12, v13])
+sch.reorder(l14, l18, l15, l19, l22, l16, l20, l23, l17, l21)
+```
+
+### 4.2. Exploring the Design Space
+
+Meta Schedule provides several built-in exploration strategies to exhaustively or efficiently search
+for efficient schedules.
+
+**Random search by replaying schedule functions.** With a user-provided schedule function
+as a black-box design space generator, our system could repetitively invoke such an opaque function
+without doing any extra analysis. The function could be written in C++ or Python, or any language
+that implements packed function FFI. If sampling instructions are present in the function, then each
+invocation results in a different IRModule after being scheduled. Effectively, it is equivalent to
+random search without trace, allowing the flexibility for users to define arbitrary functions that
+trace may not well support (e.g. control flow divergence based on the value of intermediate random
+variables), but it forbids future opportunity of any trace-based analysis.
+
+**Random search by replaying traces.** From a design space generator, our system obtains the
+traces as the search space, and then those traces are replayed repetitively with a builtin interpreter.
+If sampling instructions are present on the traces, each replay explores in a random point in the
+design space of schedules.

Review comment:
       Much clear to me. One nit: It might be better to illustrate the benefit of trace analysis (i.e., what obvious optimization/search we can do if trace is available for this workload).
   
   Other than that, I don't have other comments and could sign off this RFC. Thanks for the efforts!




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tvm-rfcs] junrushao1994 commented on pull request #5: [RFC] Meta Schedule (AutoTIR)

Posted by GitBox <gi...@apache.org>.
junrushao1994 commented on pull request #5:
URL: https://github.com/apache/tvm-rfcs/pull/5#issuecomment-886889157


   Sorry for the late reply! I will be working on it today


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tvm-rfcs] junrushao1994 commented on a change in pull request #5: [RFC] Meta Schedule (AutoTIR)

Posted by GitBox <gi...@apache.org>.
junrushao1994 commented on a change in pull request #5:
URL: https://github.com/apache/tvm-rfcs/pull/5#discussion_r661788452



##########
File path: rfcs/0001-meta-schedule-autotensorir.md
##########
@@ -0,0 +1,297 @@
+* Feature Name: Meta Schedule (AutoTensorIR)
+* Start Date: 2021-05-28
+* RFC PR: TBD (apache/tvm-rfcs#0000)
+* GitHub Issue: TBD (apache/tvm-rfcs#0000)
+
+## 1. Summary
+
+This proposal introduces Meta Schedule: a probabilistic scheduling DSL on TIR that unifies the approaches of AutoTVM and Auto Scheduler (Ansor). Meta schedule provides a pragmatic way to define the space of automatic tuning, extensibility in terms of all possible TIR schedule primitives like tensorization and loop partitioning, and customizability on every layer of the automation system.
+
+Meta Schedule is our 3rd generation automatic scheduling system.
+
+## 2. Motivation
+
+**Scheduling and Design Space**
+
+In TVM TensorIR, optimization of a TensorIR program is done via a sequence of transformations. For example, we reorder loops for better locality and we tensorize for specific hardware intrinsics. The process of invoking such a set of pre-defined transformations is called “**scheduling**”, and each transformation is called a “**schedule primitive**”. These primitives form a domain-specific language (DSL) describing the transformation of TensorIR programs. **Design space** is the set of all possible schedulings with respect to a TensorIR program.
+
+**Problems with the Current Scheduling System**
+
+* **Manual schedule**: Developers optimize their programs by manually invoking schedule primitives, i.e. explore points in the design space with humans in the loop. This can be a tedious and error-prone approach, hence the creation of AutoTVM and AutoScheduler (Ansor).
+* **AutoTVM**: The automation system requires users to define “schedule templates” as the design space for each operator. Therefore, it is inextensible to hundreds of operators.
+* **AutoScheduler (Ansor)**: It automatically generates schedule templates as the design space, according to a set of predefined “search rules”. However, it is non-trivial to extend AutoScheduler to new schedule primitives (tensorize, loop partition, software pipelining).
+* The three systems above have isolated sets of APIs with several layers of their own abstraction, which are not only hard to learn, but also engineering-intensive to customize.
+
+**Benefit of Meta Schedule**
+
+* Succinct syntax, consistent APIs to TensorIR schedule with no other layer of abstraction.
+* Provides unified APIs for implementing manual schedule, AutoTVM and AutoScheduler (Ansor).
+* Extensibility to all the schedule primitives, including tensorization and loop partitioning. Almost no extra effort is needed to use a new primitive in auto-tuning.
+* The automation infrastructure is customizable across every layer.
+
+## 3. Guide-level explanation
+
+In this section, we describe the syntax of meta schedule DSL, and how it could be used to describe and auto-generate the design space.
+
+### 3.1. Manual Schedule
+
+Meta schedule APIs are almost the same as TE or TensorIR scheduling. Here is an example of a manual schedule for matrix multiplication:
+
+```python
+# Designate a set of tile sizes
+i_tiles = [16, 8, 8, 8]
+j_tiles = [16, 8, 8, 8]
+k_tiles = [256, 8]
+
+# Tile the loops according to the tile sizes
+i_0, i_1, i_2, i_3 = sch.split(loop=i, factors=i_tiles)
+j_0, j_1, j_2, j_3 = sch.split(loop=j, factors=j_tiles)
+k_0, k_1           = sch.split(loop=k, factors=k_tiles)
+
+# Organize the loops into “SSRSRS” 6-level tiles
+sch.reorder(
+    i_0, j_0, # S
+    i_1, j_1, # S
+    k_0,      # R
+    i_2, j_2, # S
+    k_1,      # R
+    i_3, j_3, # S
+)
+```
+
+In this example, the developers may tweak the tile sizes and measure the performance of the generated kernels to explore the opportunities of potential optimization.
+
+Generally speaking, while writing a schedule, there are often some parameters that are hard to determine ahead of time, for example, tile sizes, unroll steps, or which tensor intrinsics to use. Developers may manually enumerate possible combinations of these unknown factors, and then pick the best schedule according to measurement results on their device.
+
+### 3.2. AutoTVM-style Design Space Description
+
+Meta schedule extends the schedule DSL with sampling instructions. When included in a schedule, these instructions parametrize the schedule from a single deterministic point to a space supported by random variables (tile size, etc.), making it possible for developers to describe the design space with meta schedule APIs.
+
+We can extend the matmul example above to cover all possible tilings using these sampling instructions:
+
+```python
+# Sample tile sizes
+i_tiles = sch.sample_perfect_tile(i, n=4)
+j_tiles = sch.sample_perfect_tile(j, n=4)
+k_tiles = sch.sample_perfect_tile(k, n=2)
+# Tile the loops according to the random variables
+i_0, i_1, i_2, i_3 = sch.split(loop=i, factors=i_tiles)
+j_0, j_1, j_2, j_3 = sch.split(loop=j, factors=j_tiles)
+k_0, k_1           = sch.split(loop=k, factors=k_tiles)
+# Organize the loops into “SSRSRS” 6-level tiles
+sch.reorder(
+    i_0, j_0, # S
+    i_1, j_1, # S
+    k_0,      # R
+    i_2, j_2, # S
+    k_1,      # R
+    i_3, j_3, # S
+)
+```
+
+### 3.3. Composite Schedule
+
+Each schedule primitive handles only a very basic operation to transform the IR, for example, `split` only splits a loop into two. In the real world, the over-fine granularity of those primitives usually leads to repetitive and verbose scheduling code, as [mentioned](https://discuss.tvm.apache.org/t/rfc-tensorir-a-schedulable-ir-for-tvm/7872/43?u=junrushao1994) by developers in our community.
+
+To counter this challenge, we allow users to register “composite schedules” that analyze the IR, and apply a set of schedule primitives correspondingly. For instance, a composite schedule may inspect a TensorIR block and decide whether we should call `compute_inline` on it. The composite schedule may use sampling instructions to fill in undecided choices.
+
+Our system also ships with some built-in composite schedules, including:
+
+* Multi-level tiling
+* Inline pure spatial blocks
+* Parallelize & vectorize & unroll
+* Auto tensorize
+* …
+
+### 3.4. AutoScheduler-style Design Space Generation
+
+AutoScheduler (Ansor) generates schedule templates by applying their SearchRules to each stage. Meta schedule treats a search rule as a composite schedule, and applies each composite schedule to each block of TensorIR to generate the design space.
+
+### 3.5. Unifying manual schedule / AutoTVM / Ansor
+
+In this section, we show that the design space induced by TE manual schedule, AutoTVM and Ansor are all subsets of meta schedule, and meta schedule further allows mixing those three styles to search jointly.
+
+**Manual schedule**. The TE schedule is a special case of a meta schedule program, where there is no randomness introduced by sampling instructions. It is a single point in terms of design space.
+
+**AutoTVM (Template-based tuning)**. Writing one or more schedule functions in meta schedule, potentially with sampling instructions, is a natural representation of AutoTVM’s schedule templates (knobs). The PPL generates one or more traces as the design space to explore.
+
+**AutoScheduler (Ansor, Template-free tuning)**. As mentioned in the previous section, application of composite schedule rules generates the design space, which is equivalent to Ansor’s sketch generation.
+
+**Mixing styles in design space definition**. By taking union of the spaces induced by the three special cases, our system allows developers to combine generic rules that Ansor provides and operator-specific scheduling.
+
+## 4. Reference-level explanation
+
+In this section, we introduce the underlying techniques for the automation system to extract and explore the design space.
+
+### 4.1. Execution trace as the design space
+
+**Trace**. To represent the design space defined by the meta schedule DSL, the underlying system records all the instructions users applied to the schedule class, including sampling and schedule primitives. We call this list of instructions a trace.
+
+Executing the example above results in the following trace:
+
+```
+Instruction 0. Sample-Perfect-Tile
+Instruction 1. Sample-Perfect-Tile
+Instruction 2. Sample-Perfect-Tile
+Instruction 3. Split
+Instruction 4. Split
+Instruction 5. Split
+Instruction 6. Reorder
+```
+
+The trace is not directly user-facing, but a data structure inside the user-facing `Schedule` class that records the execution. The automation system extracts the trace and finds out the design space according to the sampling instructions.
+
+**Union of traces**. Often a single trace is unable to represent the entire space. Therefore, more precisely, our system works on a list of traces as the union of potential design space.
+
+**Fork a trace**. When two different decisions in the scheduling process are equally important to generate high-performance schedules, we allow forking the trace into two, and the design space is the union of the forked traces.
+
+### 4.2. Exploring the Search Space
+
+Meta Schedule provides several built-in exploration strategies to exhaustively or efficiently search for efficient schedules.
+
+**Program replay**. A simple strategy that replays the schedule program that generates the PPL, and doesn’t use any advantage provided by the PPL.

Review comment:
       I clarify the two paragraphs:
   
   * **Program replay.** A simple strategy that replays the user-provided schedule function without using or taking any advantage of trace.
   
   * **Random search.** Extracts the traces as the design space from any design space generator (e.g. user-provided schedule function, composite schedule rules applied to each block, or any custom space generator), repetitively mutates the random decisions of a random trace and re-executes the traces.
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tvm-rfcs] comaniac commented on a change in pull request #5: [RFC] Meta Schedule (AutoTensorIR)

Posted by GitBox <gi...@apache.org>.
comaniac commented on a change in pull request #5:
URL: https://github.com/apache/tvm-rfcs/pull/5#discussion_r647775615



##########
File path: rfcs/0001-meta-schedule-autotensorir.md
##########
@@ -0,0 +1,297 @@
+* Feature Name: Meta Schedule (AutoTensorIR)
+* Start Date: 2021-05-28
+* RFC PR: TBD (apache/tvm-rfcs#0000)
+* GitHub Issue: TBD (apache/tvm-rfcs#0000)
+
+## 1. Summary
+
+This proposal introduces Meta Schedule: a probabilistic scheduling DSL on TIR that unifies the approaches of AutoTVM and Auto Scheduler (Ansor). Meta schedule provides a pragmatic way to define the space of automatic tuning, extensibility in terms of all possible TIR schedule primitives like tensorization and loop partitioning, and customizability on every layer of the automation system.
+
+Meta Schedule is our 3rd generation automatic scheduling system.
+
+## 2. Motivation
+
+**Scheduling and Design Space**
+
+In TVM TensorIR, optimization of a TensorIR program is done via a sequence of transformations. For example, we reorder loops for better locality and we tensorize for specific hardware intrinsics. The process of invoking such a set of pre-defined transformations is called “**scheduling**”, and each transformation is called a “**schedule primitive**”. These primitives form a domain-specific language (DSL) describing the transformation of TensorIR programs. **Design space** is the set of all possible schedulings with respect to a TensorIR program.
+
+**Problems with the Current Scheduling System**
+
+* **Manual schedule**: Developers optimize their programs by manually invoking schedule primitives, i.e. explore points in the design space with humans in the loop. This can be a tedious and error-prone approach, hence the creation of AutoTVM and AutoScheduler (Ansor).
+* **AutoTVM**: The automation system requires users to define “schedule templates” as the design space for each operator. Therefore, it is inextensible to hundreds of operators.
+* **AutoScheduler (Ansor)**: It automatically generates schedule templates as the design space, according to a set of predefined “search rules”. However, it is non-trivial to extend AutoScheduler to new schedule primitives (tensorize, loop partition, software pipelining).
+* The three systems above have isolated sets of APIs with several layers of their own abstraction, which are not only hard to learn, but also engineering-intensive to customize.
+
+**Benefit of Meta Schedule**
+
+* Succinct syntax, consistent APIs to TensorIR schedule with no other layer of abstraction.
+* Provides unified APIs for implementing manual schedule, AutoTVM and AutoScheduler (Ansor).
+* Extensibility to all the schedule primitives, including tensorization and loop partitioning. Almost no extra effort is needed to use a new primitive in auto-tuning.
+* The automation infrastructure is customizable across every layer.
+
+## 3. Guide-level explanation
+
+In this section, we describe the syntax of meta schedule DSL, and how it could be used to describe and auto-generate the design space.
+
+### 3.1. Manual Schedule
+
+Meta schedule APIs are almost the same as TE or TensorIR scheduling. Here is an example of a manual schedule for matrix multiplication:
+
+```python
+# Designate a set of tile sizes
+i_tiles = [16, 8, 8, 8]
+j_tiles = [16, 8, 8, 8]
+k_tiles = [256, 8]
+
+# Tile the loops according to the tile sizes
+i_0, i_1, i_2, i_3 = sch.split(loop=i, factors=i_tiles)
+j_0, j_1, j_2, j_3 = sch.split(loop=j, factors=j_tiles)
+k_0, k_1           = sch.split(loop=k, factors=k_tiles)
+
+# Organize the loops into “SSRSRS” 6-level tiles
+sch.reorder(
+    i_0, j_0, # S
+    i_1, j_1, # S
+    k_0,      # R
+    i_2, j_2, # S
+    k_1,      # R
+    i_3, j_3, # S
+)
+```
+
+In this example, the developers may tweak the tile sizes and measure the performance of the generated kernels to explore the opportunities of potential optimization.
+
+Generally speaking, while writing a schedule, there are often some parameters that are hard to determine ahead of time, for example, tile sizes, unroll steps, or which tensor intrinsics to use. Developers may manually enumerate possible combinations of these unknown factors, and then pick the best schedule according to measurement results on their device.
+
+### 3.2. AutoTVM-style Design Space Description
+
+Meta schedule extends the schedule DSL with sampling instructions. When included in a schedule, these instructions parametrize the schedule from a single deterministic point to a space supported by random variables (tile size, etc.), making it possible for developers to describe the design space with meta schedule APIs.
+
+We can extend the matmul example above to cover all possible tilings using these sampling instructions:
+
+```python
+# Sample tile sizes
+i_tiles = sch.sample_perfect_tile(i, n=4)

Review comment:
       Just out of curiosity, will we have something like `sample_tile` that samples not only the factors of `i` but all integers in `[1, i)`?

##########
File path: rfcs/0001-meta-schedule-autotensorir.md
##########
@@ -0,0 +1,297 @@
+* Feature Name: Meta Schedule (AutoTensorIR)
+* Start Date: 2021-05-28
+* RFC PR: TBD (apache/tvm-rfcs#0000)
+* GitHub Issue: TBD (apache/tvm-rfcs#0000)
+
+## 1. Summary
+
+This proposal introduces Meta Schedule: a probabilistic scheduling DSL on TIR that unifies the approaches of AutoTVM and Auto Scheduler (Ansor). Meta schedule provides a pragmatic way to define the space of automatic tuning, extensibility in terms of all possible TIR schedule primitives like tensorization and loop partitioning, and customizability on every layer of the automation system.
+
+Meta Schedule is our 3rd generation automatic scheduling system.
+
+## 2. Motivation
+
+**Scheduling and Design Space**
+
+In TVM TensorIR, optimization of a TensorIR program is done via a sequence of transformations. For example, we reorder loops for better locality and we tensorize for specific hardware intrinsics. The process of invoking such a set of pre-defined transformations is called “**scheduling**”, and each transformation is called a “**schedule primitive**”. These primitives form a domain-specific language (DSL) describing the transformation of TensorIR programs. **Design space** is the set of all possible schedulings with respect to a TensorIR program.
+
+**Problems with the Current Scheduling System**
+
+* **Manual schedule**: Developers optimize their programs by manually invoking schedule primitives, i.e. explore points in the design space with humans in the loop. This can be a tedious and error-prone approach, hence the creation of AutoTVM and AutoScheduler (Ansor).
+* **AutoTVM**: The automation system requires users to define “schedule templates” as the design space for each operator. Therefore, it is inextensible to hundreds of operators.
+* **AutoScheduler (Ansor)**: It automatically generates schedule templates as the design space, according to a set of predefined “search rules”. However, it is non-trivial to extend AutoScheduler to new schedule primitives (tensorize, loop partition, software pipelining).
+* The three systems above have isolated sets of APIs with several layers of their own abstraction, which are not only hard to learn, but also engineering-intensive to customize.
+
+**Benefit of Meta Schedule**
+
+* Succinct syntax, consistent APIs to TensorIR schedule with no other layer of abstraction.
+* Provides unified APIs for implementing manual schedule, AutoTVM and AutoScheduler (Ansor).
+* Extensibility to all the schedule primitives, including tensorization and loop partitioning. Almost no extra effort is needed to use a new primitive in auto-tuning.
+* The automation infrastructure is customizable across every layer.
+
+## 3. Guide-level explanation
+
+In this section, we describe the syntax of meta schedule DSL, and how it could be used to describe and auto-generate the design space.
+
+### 3.1. Manual Schedule
+
+Meta schedule APIs are almost the same as TE or TensorIR scheduling. Here is an example of a manual schedule for matrix multiplication:
+
+```python
+# Designate a set of tile sizes
+i_tiles = [16, 8, 8, 8]
+j_tiles = [16, 8, 8, 8]
+k_tiles = [256, 8]
+
+# Tile the loops according to the tile sizes
+i_0, i_1, i_2, i_3 = sch.split(loop=i, factors=i_tiles)
+j_0, j_1, j_2, j_3 = sch.split(loop=j, factors=j_tiles)
+k_0, k_1           = sch.split(loop=k, factors=k_tiles)
+
+# Organize the loops into “SSRSRS” 6-level tiles
+sch.reorder(
+    i_0, j_0, # S
+    i_1, j_1, # S
+    k_0,      # R
+    i_2, j_2, # S
+    k_1,      # R
+    i_3, j_3, # S
+)
+```
+
+In this example, the developers may tweak the tile sizes and measure the performance of the generated kernels to explore the opportunities of potential optimization.
+
+Generally speaking, while writing a schedule, there are often some parameters that are hard to determine ahead of time, for example, tile sizes, unroll steps, or which tensor intrinsics to use. Developers may manually enumerate possible combinations of these unknown factors, and then pick the best schedule according to measurement results on their device.
+
+### 3.2. AutoTVM-style Design Space Description
+
+Meta schedule extends the schedule DSL with sampling instructions. When included in a schedule, these instructions parametrize the schedule from a single deterministic point to a space supported by random variables (tile size, etc.), making it possible for developers to describe the design space with meta schedule APIs.
+
+We can extend the matmul example above to cover all possible tilings using these sampling instructions:
+
+```python
+# Sample tile sizes
+i_tiles = sch.sample_perfect_tile(i, n=4)
+j_tiles = sch.sample_perfect_tile(j, n=4)
+k_tiles = sch.sample_perfect_tile(k, n=2)
+# Tile the loops according to the random variables
+i_0, i_1, i_2, i_3 = sch.split(loop=i, factors=i_tiles)
+j_0, j_1, j_2, j_3 = sch.split(loop=j, factors=j_tiles)
+k_0, k_1           = sch.split(loop=k, factors=k_tiles)
+# Organize the loops into “SSRSRS” 6-level tiles
+sch.reorder(
+    i_0, j_0, # S
+    i_1, j_1, # S
+    k_0,      # R
+    i_2, j_2, # S
+    k_1,      # R
+    i_3, j_3, # S
+)
+```
+
+### 3.3. Composite Schedule
+
+Each schedule primitive handles only a very basic operation to transform the IR, for example, `split` only splits a loop into two. In the real world, the over-fine granularity of those primitives usually leads to repetitive and verbose scheduling code, as [mentioned](https://discuss.tvm.apache.org/t/rfc-tensorir-a-schedulable-ir-for-tvm/7872/43?u=junrushao1994) by developers in our community.
+
+To counter this challenge, we allow users to register “composite schedules” that analyze the IR, and apply a set of schedule primitives correspondingly. For instance, a composite schedule may inspect a TensorIR block and decide whether we should call `compute_inline` on it. The composite schedule may use sampling instructions to fill in undecided choices.
+
+Our system also ships with some built-in composite schedules, including:
+
+* Multi-level tiling
+* Inline pure spatial blocks
+* Parallelize & vectorize & unroll
+* Auto tensorize
+* …
+
+### 3.4. AutoScheduler-style Design Space Generation
+
+AutoScheduler (Ansor) generates schedule templates by applying their SearchRules to each stage. Meta schedule treats a search rule as a composite schedule, and applies each composite schedule to each block of TensorIR to generate the design space.

Review comment:
       IMHO, this subsection can be merged to 3.3 as they are basically talking about the same thing. Using Ansor as an example when introducing the composite schedule can also help readers understand its purpose easily.

##########
File path: rfcs/0001-meta-schedule-autotensorir.md
##########
@@ -0,0 +1,297 @@
+* Feature Name: Meta Schedule (AutoTensorIR)
+* Start Date: 2021-05-28
+* RFC PR: TBD (apache/tvm-rfcs#0000)
+* GitHub Issue: TBD (apache/tvm-rfcs#0000)
+
+## 1. Summary
+
+This proposal introduces Meta Schedule: a probabilistic scheduling DSL on TIR that unifies the approaches of AutoTVM and Auto Scheduler (Ansor). Meta schedule provides a pragmatic way to define the space of automatic tuning, extensibility in terms of all possible TIR schedule primitives like tensorization and loop partitioning, and customizability on every layer of the automation system.
+
+Meta Schedule is our 3rd generation automatic scheduling system.
+
+## 2. Motivation
+
+**Scheduling and Design Space**
+
+In TVM TensorIR, optimization of a TensorIR program is done via a sequence of transformations. For example, we reorder loops for better locality and we tensorize for specific hardware intrinsics. The process of invoking such a set of pre-defined transformations is called “**scheduling**”, and each transformation is called a “**schedule primitive**”. These primitives form a domain-specific language (DSL) describing the transformation of TensorIR programs. **Design space** is the set of all possible schedulings with respect to a TensorIR program.
+
+**Problems with the Current Scheduling System**
+
+* **Manual schedule**: Developers optimize their programs by manually invoking schedule primitives, i.e. explore points in the design space with humans in the loop. This can be a tedious and error-prone approach, hence the creation of AutoTVM and AutoScheduler (Ansor).
+* **AutoTVM**: The automation system requires users to define “schedule templates” as the design space for each operator. Therefore, it is inextensible to hundreds of operators.
+* **AutoScheduler (Ansor)**: It automatically generates schedule templates as the design space, according to a set of predefined “search rules”. However, it is non-trivial to extend AutoScheduler to new schedule primitives (tensorize, loop partition, software pipelining).
+* The three systems above have isolated sets of APIs with several layers of their own abstraction, which are not only hard to learn, but also engineering-intensive to customize.
+
+**Benefit of Meta Schedule**
+
+* Succinct syntax, consistent APIs to TensorIR schedule with no other layer of abstraction.
+* Provides unified APIs for implementing manual schedule, AutoTVM and AutoScheduler (Ansor).
+* Extensibility to all the schedule primitives, including tensorization and loop partitioning. Almost no extra effort is needed to use a new primitive in auto-tuning.
+* The automation infrastructure is customizable across every layer.
+
+## 3. Guide-level explanation
+
+In this section, we describe the syntax of meta schedule DSL, and how it could be used to describe and auto-generate the design space.
+
+### 3.1. Manual Schedule
+
+Meta schedule APIs are almost the same as TE or TensorIR scheduling. Here is an example of a manual schedule for matrix multiplication:
+
+```python
+# Designate a set of tile sizes
+i_tiles = [16, 8, 8, 8]
+j_tiles = [16, 8, 8, 8]
+k_tiles = [256, 8]
+
+# Tile the loops according to the tile sizes
+i_0, i_1, i_2, i_3 = sch.split(loop=i, factors=i_tiles)
+j_0, j_1, j_2, j_3 = sch.split(loop=j, factors=j_tiles)
+k_0, k_1           = sch.split(loop=k, factors=k_tiles)
+
+# Organize the loops into “SSRSRS” 6-level tiles
+sch.reorder(
+    i_0, j_0, # S
+    i_1, j_1, # S
+    k_0,      # R
+    i_2, j_2, # S
+    k_1,      # R
+    i_3, j_3, # S
+)
+```
+
+In this example, the developers may tweak the tile sizes and measure the performance of the generated kernels to explore the opportunities of potential optimization.
+
+Generally speaking, while writing a schedule, there are often some parameters that are hard to determine ahead of time, for example, tile sizes, unroll steps, or which tensor intrinsics to use. Developers may manually enumerate possible combinations of these unknown factors, and then pick the best schedule according to measurement results on their device.
+
+### 3.2. AutoTVM-style Design Space Description
+
+Meta schedule extends the schedule DSL with sampling instructions. When included in a schedule, these instructions parametrize the schedule from a single deterministic point to a space supported by random variables (tile size, etc.), making it possible for developers to describe the design space with meta schedule APIs.
+
+We can extend the matmul example above to cover all possible tilings using these sampling instructions:
+
+```python
+# Sample tile sizes
+i_tiles = sch.sample_perfect_tile(i, n=4)
+j_tiles = sch.sample_perfect_tile(j, n=4)
+k_tiles = sch.sample_perfect_tile(k, n=2)
+# Tile the loops according to the random variables
+i_0, i_1, i_2, i_3 = sch.split(loop=i, factors=i_tiles)
+j_0, j_1, j_2, j_3 = sch.split(loop=j, factors=j_tiles)
+k_0, k_1           = sch.split(loop=k, factors=k_tiles)
+# Organize the loops into “SSRSRS” 6-level tiles
+sch.reorder(
+    i_0, j_0, # S
+    i_1, j_1, # S
+    k_0,      # R
+    i_2, j_2, # S
+    k_1,      # R
+    i_3, j_3, # S
+)
+```
+
+### 3.3. Composite Schedule
+
+Each schedule primitive handles only a very basic operation to transform the IR, for example, `split` only splits a loop into two. In the real world, the over-fine granularity of those primitives usually leads to repetitive and verbose scheduling code, as [mentioned](https://discuss.tvm.apache.org/t/rfc-tensorir-a-schedulable-ir-for-tvm/7872/43?u=junrushao1994) by developers in our community.
+
+To counter this challenge, we allow users to register “composite schedules” that analyze the IR, and apply a set of schedule primitives correspondingly. For instance, a composite schedule may inspect a TensorIR block and decide whether we should call `compute_inline` on it. The composite schedule may use sampling instructions to fill in undecided choices.
+
+Our system also ships with some built-in composite schedules, including:
+
+* Multi-level tiling
+* Inline pure spatial blocks
+* Parallelize & vectorize & unroll
+* Auto tensorize
+* …
+
+### 3.4. AutoScheduler-style Design Space Generation
+
+AutoScheduler (Ansor) generates schedule templates by applying their SearchRules to each stage. Meta schedule treats a search rule as a composite schedule, and applies each composite schedule to each block of TensorIR to generate the design space.
+
+### 3.5. Unifying manual schedule / AutoTVM / Ansor
+
+In this section, we show that the design space induced by TE manual schedule, AutoTVM and Ansor are all subsets of meta schedule, and meta schedule further allows mixing those three styles to search jointly.
+
+**Manual schedule**. The TE schedule is a special case of a meta schedule program, where there is no randomness introduced by sampling instructions. It is a single point in terms of design space.
+
+**AutoTVM (Template-based tuning)**. Writing one or more schedule functions in meta schedule, potentially with sampling instructions, is a natural representation of AutoTVM’s schedule templates (knobs). The PPL generates one or more traces as the design space to explore.
+
+**AutoScheduler (Ansor, Template-free tuning)**. As mentioned in the previous section, application of composite schedule rules generates the design space, which is equivalent to Ansor’s sketch generation.
+
+**Mixing styles in design space definition**. By taking union of the spaces induced by the three special cases, our system allows developers to combine generic rules that Ansor provides and operator-specific scheduling.
+
+## 4. Reference-level explanation
+
+In this section, we introduce the underlying techniques for the automation system to extract and explore the design space.
+
+### 4.1. Execution trace as the design space
+
+**Trace**. To represent the design space defined by the meta schedule DSL, the underlying system records all the instructions users applied to the schedule class, including sampling and schedule primitives. We call this list of instructions a trace.
+
+Executing the example above results in the following trace:
+
+```
+Instruction 0. Sample-Perfect-Tile
+Instruction 1. Sample-Perfect-Tile
+Instruction 2. Sample-Perfect-Tile
+Instruction 3. Split
+Instruction 4. Split
+Instruction 5. Split
+Instruction 6. Reorder
+```
+
+The trace is not directly user-facing, but a data structure inside the user-facing `Schedule` class that records the execution. The automation system extracts the trace and finds out the design space according to the sampling instructions.
+
+**Union of traces**. Often a single trace is unable to represent the entire space. Therefore, more precisely, our system works on a list of traces as the union of potential design space.

Review comment:
       Didn't understand this point. Are you saying a design space is composite of many traces? If so, it might be clearer to just name this point "Design Space", and say a trace with sampled values is a design point in the space, where the space is composed of traces with all possible combinations.
   
   In addition, please be careful about the term "design space" and "explored points". Former is an entire space that potentially be explored during the search, regardless whether a point will be visited or not. Latter is a part of the design space that is explored during the search. From the description of "Union of traces" and "Fork a trace", I feel the "design space" you are referring to here is just the explored traces.

##########
File path: rfcs/0001-meta-schedule-autotensorir.md
##########
@@ -0,0 +1,297 @@
+* Feature Name: Meta Schedule (AutoTensorIR)
+* Start Date: 2021-05-28
+* RFC PR: TBD (apache/tvm-rfcs#0000)
+* GitHub Issue: TBD (apache/tvm-rfcs#0000)
+
+## 1. Summary
+
+This proposal introduces Meta Schedule: a probabilistic scheduling DSL on TIR that unifies the approaches of AutoTVM and Auto Scheduler (Ansor). Meta schedule provides a pragmatic way to define the space of automatic tuning, extensibility in terms of all possible TIR schedule primitives like tensorization and loop partitioning, and customizability on every layer of the automation system.
+
+Meta Schedule is our 3rd generation automatic scheduling system.
+
+## 2. Motivation
+
+**Scheduling and Design Space**
+
+In TVM TensorIR, optimization of a TensorIR program is done via a sequence of transformations. For example, we reorder loops for better locality and we tensorize for specific hardware intrinsics. The process of invoking such a set of pre-defined transformations is called “**scheduling**”, and each transformation is called a “**schedule primitive**”. These primitives form a domain-specific language (DSL) describing the transformation of TensorIR programs. **Design space** is the set of all possible schedulings with respect to a TensorIR program.
+
+**Problems with the Current Scheduling System**
+
+* **Manual schedule**: Developers optimize their programs by manually invoking schedule primitives, i.e. explore points in the design space with humans in the loop. This can be a tedious and error-prone approach, hence the creation of AutoTVM and AutoScheduler (Ansor).
+* **AutoTVM**: The automation system requires users to define “schedule templates” as the design space for each operator. Therefore, it is inextensible to hundreds of operators.
+* **AutoScheduler (Ansor)**: It automatically generates schedule templates as the design space, according to a set of predefined “search rules”. However, it is non-trivial to extend AutoScheduler to new schedule primitives (tensorize, loop partition, software pipelining).
+* The three systems above have isolated sets of APIs with several layers of their own abstraction, which are not only hard to learn, but also engineering-intensive to customize.
+
+**Benefit of Meta Schedule**
+
+* Succinct syntax, consistent APIs to TensorIR schedule with no other layer of abstraction.
+* Provides unified APIs for implementing manual schedule, AutoTVM and AutoScheduler (Ansor).
+* Extensibility to all the schedule primitives, including tensorization and loop partitioning. Almost no extra effort is needed to use a new primitive in auto-tuning.
+* The automation infrastructure is customizable across every layer.
+
+## 3. Guide-level explanation
+
+In this section, we describe the syntax of meta schedule DSL, and how it could be used to describe and auto-generate the design space.
+
+### 3.1. Manual Schedule
+
+Meta schedule APIs are almost the same as TE or TensorIR scheduling. Here is an example of a manual schedule for matrix multiplication:
+
+```python
+# Designate a set of tile sizes
+i_tiles = [16, 8, 8, 8]
+j_tiles = [16, 8, 8, 8]
+k_tiles = [256, 8]
+
+# Tile the loops according to the tile sizes
+i_0, i_1, i_2, i_3 = sch.split(loop=i, factors=i_tiles)
+j_0, j_1, j_2, j_3 = sch.split(loop=j, factors=j_tiles)
+k_0, k_1           = sch.split(loop=k, factors=k_tiles)
+
+# Organize the loops into “SSRSRS” 6-level tiles
+sch.reorder(
+    i_0, j_0, # S
+    i_1, j_1, # S
+    k_0,      # R
+    i_2, j_2, # S
+    k_1,      # R
+    i_3, j_3, # S
+)
+```
+
+In this example, the developers may tweak the tile sizes and measure the performance of the generated kernels to explore the opportunities of potential optimization.
+
+Generally speaking, while writing a schedule, there are often some parameters that are hard to determine ahead of time, for example, tile sizes, unroll steps, or which tensor intrinsics to use. Developers may manually enumerate possible combinations of these unknown factors, and then pick the best schedule according to measurement results on their device.
+
+### 3.2. AutoTVM-style Design Space Description
+
+Meta schedule extends the schedule DSL with sampling instructions. When included in a schedule, these instructions parametrize the schedule from a single deterministic point to a space supported by random variables (tile size, etc.), making it possible for developers to describe the design space with meta schedule APIs.
+
+We can extend the matmul example above to cover all possible tilings using these sampling instructions:
+
+```python
+# Sample tile sizes
+i_tiles = sch.sample_perfect_tile(i, n=4)
+j_tiles = sch.sample_perfect_tile(j, n=4)
+k_tiles = sch.sample_perfect_tile(k, n=2)
+# Tile the loops according to the random variables
+i_0, i_1, i_2, i_3 = sch.split(loop=i, factors=i_tiles)
+j_0, j_1, j_2, j_3 = sch.split(loop=j, factors=j_tiles)
+k_0, k_1           = sch.split(loop=k, factors=k_tiles)
+# Organize the loops into “SSRSRS” 6-level tiles
+sch.reorder(
+    i_0, j_0, # S
+    i_1, j_1, # S
+    k_0,      # R
+    i_2, j_2, # S
+    k_1,      # R
+    i_3, j_3, # S
+)
+```
+
+### 3.3. Composite Schedule
+
+Each schedule primitive handles only a very basic operation to transform the IR, for example, `split` only splits a loop into two. In the real world, the over-fine granularity of those primitives usually leads to repetitive and verbose scheduling code, as [mentioned](https://discuss.tvm.apache.org/t/rfc-tensorir-a-schedulable-ir-for-tvm/7872/43?u=junrushao1994) by developers in our community.
+
+To counter this challenge, we allow users to register “composite schedules” that analyze the IR, and apply a set of schedule primitives correspondingly. For instance, a composite schedule may inspect a TensorIR block and decide whether we should call `compute_inline` on it. The composite schedule may use sampling instructions to fill in undecided choices.
+
+Our system also ships with some built-in composite schedules, including:
+
+* Multi-level tiling
+* Inline pure spatial blocks
+* Parallelize & vectorize & unroll
+* Auto tensorize
+* …
+
+### 3.4. AutoScheduler-style Design Space Generation
+
+AutoScheduler (Ansor) generates schedule templates by applying their SearchRules to each stage. Meta schedule treats a search rule as a composite schedule, and applies each composite schedule to each block of TensorIR to generate the design space.
+
+### 3.5. Unifying manual schedule / AutoTVM / Ansor
+
+In this section, we show that the design space induced by TE manual schedule, AutoTVM and Ansor are all subsets of meta schedule, and meta schedule further allows mixing those three styles to search jointly.
+
+**Manual schedule**. The TE schedule is a special case of a meta schedule program, where there is no randomness introduced by sampling instructions. It is a single point in terms of design space.
+
+**AutoTVM (Template-based tuning)**. Writing one or more schedule functions in meta schedule, potentially with sampling instructions, is a natural representation of AutoTVM’s schedule templates (knobs). The PPL generates one or more traces as the design space to explore.
+
+**AutoScheduler (Ansor, Template-free tuning)**. As mentioned in the previous section, application of composite schedule rules generates the design space, which is equivalent to Ansor’s sketch generation.
+
+**Mixing styles in design space definition**. By taking union of the spaces induced by the three special cases, our system allows developers to combine generic rules that Ansor provides and operator-specific scheduling.
+
+## 4. Reference-level explanation

Review comment:
       Might be better to first explain this term.

##########
File path: rfcs/0001-meta-schedule-autotensorir.md
##########
@@ -0,0 +1,297 @@
+* Feature Name: Meta Schedule (AutoTensorIR)
+* Start Date: 2021-05-28
+* RFC PR: TBD (apache/tvm-rfcs#0000)
+* GitHub Issue: TBD (apache/tvm-rfcs#0000)
+
+## 1. Summary
+
+This proposal introduces Meta Schedule: a probabilistic scheduling DSL on TIR that unifies the approaches of AutoTVM and Auto Scheduler (Ansor). Meta schedule provides a pragmatic way to define the space of automatic tuning, extensibility in terms of all possible TIR schedule primitives like tensorization and loop partitioning, and customizability on every layer of the automation system.
+
+Meta Schedule is our 3rd generation automatic scheduling system.
+
+## 2. Motivation
+
+**Scheduling and Design Space**
+
+In TVM TensorIR, optimization of a TensorIR program is done via a sequence of transformations. For example, we reorder loops for better locality and we tensorize for specific hardware intrinsics. The process of invoking such a set of pre-defined transformations is called “**scheduling**”, and each transformation is called a “**schedule primitive**”. These primitives form a domain-specific language (DSL) describing the transformation of TensorIR programs. **Design space** is the set of all possible schedulings with respect to a TensorIR program.
+
+**Problems with the Current Scheduling System**
+
+* **Manual schedule**: Developers optimize their programs by manually invoking schedule primitives, i.e. explore points in the design space with humans in the loop. This can be a tedious and error-prone approach, hence the creation of AutoTVM and AutoScheduler (Ansor).
+* **AutoTVM**: The automation system requires users to define “schedule templates” as the design space for each operator. Therefore, it is inextensible to hundreds of operators.
+* **AutoScheduler (Ansor)**: It automatically generates schedule templates as the design space, according to a set of predefined “search rules”. However, it is non-trivial to extend AutoScheduler to new schedule primitives (tensorize, loop partition, software pipelining).
+* The three systems above have isolated sets of APIs with several layers of their own abstraction, which are not only hard to learn, but also engineering-intensive to customize.
+
+**Benefit of Meta Schedule**
+
+* Succinct syntax, consistent APIs to TensorIR schedule with no other layer of abstraction.
+* Provides unified APIs for implementing manual schedule, AutoTVM and AutoScheduler (Ansor).
+* Extensibility to all the schedule primitives, including tensorization and loop partitioning. Almost no extra effort is needed to use a new primitive in auto-tuning.
+* The automation infrastructure is customizable across every layer.
+
+## 3. Guide-level explanation
+
+In this section, we describe the syntax of meta schedule DSL, and how it could be used to describe and auto-generate the design space.
+
+### 3.1. Manual Schedule
+
+Meta schedule APIs are almost the same as TE or TensorIR scheduling. Here is an example of a manual schedule for matrix multiplication:
+
+```python
+# Designate a set of tile sizes
+i_tiles = [16, 8, 8, 8]
+j_tiles = [16, 8, 8, 8]
+k_tiles = [256, 8]
+
+# Tile the loops according to the tile sizes
+i_0, i_1, i_2, i_3 = sch.split(loop=i, factors=i_tiles)
+j_0, j_1, j_2, j_3 = sch.split(loop=j, factors=j_tiles)
+k_0, k_1           = sch.split(loop=k, factors=k_tiles)
+
+# Organize the loops into “SSRSRS” 6-level tiles
+sch.reorder(
+    i_0, j_0, # S
+    i_1, j_1, # S
+    k_0,      # R
+    i_2, j_2, # S
+    k_1,      # R
+    i_3, j_3, # S
+)
+```
+
+In this example, the developers may tweak the tile sizes and measure the performance of the generated kernels to explore the opportunities of potential optimization.
+
+Generally speaking, while writing a schedule, there are often some parameters that are hard to determine ahead of time, for example, tile sizes, unroll steps, or which tensor intrinsics to use. Developers may manually enumerate possible combinations of these unknown factors, and then pick the best schedule according to measurement results on their device.
+
+### 3.2. AutoTVM-style Design Space Description
+
+Meta schedule extends the schedule DSL with sampling instructions. When included in a schedule, these instructions parametrize the schedule from a single deterministic point to a space supported by random variables (tile size, etc.), making it possible for developers to describe the design space with meta schedule APIs.
+
+We can extend the matmul example above to cover all possible tilings using these sampling instructions:
+
+```python
+# Sample tile sizes
+i_tiles = sch.sample_perfect_tile(i, n=4)
+j_tiles = sch.sample_perfect_tile(j, n=4)
+k_tiles = sch.sample_perfect_tile(k, n=2)
+# Tile the loops according to the random variables
+i_0, i_1, i_2, i_3 = sch.split(loop=i, factors=i_tiles)
+j_0, j_1, j_2, j_3 = sch.split(loop=j, factors=j_tiles)
+k_0, k_1           = sch.split(loop=k, factors=k_tiles)
+# Organize the loops into “SSRSRS” 6-level tiles
+sch.reorder(
+    i_0, j_0, # S
+    i_1, j_1, # S
+    k_0,      # R
+    i_2, j_2, # S
+    k_1,      # R
+    i_3, j_3, # S
+)
+```
+
+### 3.3. Composite Schedule
+
+Each schedule primitive handles only a very basic operation to transform the IR, for example, `split` only splits a loop into two. In the real world, the over-fine granularity of those primitives usually leads to repetitive and verbose scheduling code, as [mentioned](https://discuss.tvm.apache.org/t/rfc-tensorir-a-schedulable-ir-for-tvm/7872/43?u=junrushao1994) by developers in our community.
+
+To counter this challenge, we allow users to register “composite schedules” that analyze the IR, and apply a set of schedule primitives correspondingly. For instance, a composite schedule may inspect a TensorIR block and decide whether we should call `compute_inline` on it. The composite schedule may use sampling instructions to fill in undecided choices.
+
+Our system also ships with some built-in composite schedules, including:
+
+* Multi-level tiling
+* Inline pure spatial blocks
+* Parallelize & vectorize & unroll
+* Auto tensorize
+* …
+
+### 3.4. AutoScheduler-style Design Space Generation
+
+AutoScheduler (Ansor) generates schedule templates by applying their SearchRules to each stage. Meta schedule treats a search rule as a composite schedule, and applies each composite schedule to each block of TensorIR to generate the design space.
+
+### 3.5. Unifying manual schedule / AutoTVM / Ansor
+
+In this section, we show that the design space induced by TE manual schedule, AutoTVM and Ansor are all subsets of meta schedule, and meta schedule further allows mixing those three styles to search jointly.
+
+**Manual schedule**. The TE schedule is a special case of a meta schedule program, where there is no randomness introduced by sampling instructions. It is a single point in terms of design space.
+
+**AutoTVM (Template-based tuning)**. Writing one or more schedule functions in meta schedule, potentially with sampling instructions, is a natural representation of AutoTVM’s schedule templates (knobs). The PPL generates one or more traces as the design space to explore.
+
+**AutoScheduler (Ansor, Template-free tuning)**. As mentioned in the previous section, application of composite schedule rules generates the design space, which is equivalent to Ansor’s sketch generation.
+
+**Mixing styles in design space definition**. By taking union of the spaces induced by the three special cases, our system allows developers to combine generic rules that Ansor provides and operator-specific scheduling.
+
+## 4. Reference-level explanation
+
+In this section, we introduce the underlying techniques for the automation system to extract and explore the design space.
+
+### 4.1. Execution trace as the design space
+
+**Trace**. To represent the design space defined by the meta schedule DSL, the underlying system records all the instructions users applied to the schedule class, including sampling and schedule primitives. We call this list of instructions a trace.
+
+Executing the example above results in the following trace:
+
+```
+Instruction 0. Sample-Perfect-Tile
+Instruction 1. Sample-Perfect-Tile
+Instruction 2. Sample-Perfect-Tile
+Instruction 3. Split
+Instruction 4. Split
+Instruction 5. Split
+Instruction 6. Reorder
+```
+
+The trace is not directly user-facing, but a data structure inside the user-facing `Schedule` class that records the execution. The automation system extracts the trace and finds out the design space according to the sampling instructions.
+
+**Union of traces**. Often a single trace is unable to represent the entire space. Therefore, more precisely, our system works on a list of traces as the union of potential design space.
+
+**Fork a trace**. When two different decisions in the scheduling process are equally important to generate high-performance schedules, we allow forking the trace into two, and the design space is the union of the forked traces.
+
+### 4.2. Exploring the Search Space
+
+Meta Schedule provides several built-in exploration strategies to exhaustively or efficiently search for efficient schedules.
+
+**Program replay**. A simple strategy that replays the schedule program that generates the PPL, and doesn’t use any advantage provided by the PPL.
+
+**Random search**. Extracts the PPL, and repetitively re-executes the PPL by flipping coins purely randomly.
+
+**Cost-model-guided evolutionary search**. A more efficient exploration strategy. We define two sets of rules:
+
+* Mutator: defines how to jump to a point’s “neighbor” in the design space
+* Postprocessor: sometimes it is non-trivial to statically determine the PPL, for example:
+  * There is a hard requirement in CUDA that the maximum number of threads should not exceed 1024, but it is a random variable that cannot be determined before actually executing the PPL. In this case, we write a postprocessor that errors out when the condition is not satisfied.

Review comment:
       This might be improved in the future. I can imagine that the failure rate of resource limited devices would be very high, and the tuning time will be spent mostly on generating a valid program.

##########
File path: rfcs/0001-meta-schedule-autotensorir.md
##########
@@ -0,0 +1,297 @@
+* Feature Name: Meta Schedule (AutoTensorIR)
+* Start Date: 2021-05-28
+* RFC PR: TBD (apache/tvm-rfcs#0000)
+* GitHub Issue: TBD (apache/tvm-rfcs#0000)
+
+## 1. Summary
+
+This proposal introduces Meta Schedule: a probabilistic scheduling DSL on TIR that unifies the approaches of AutoTVM and Auto Scheduler (Ansor). Meta schedule provides a pragmatic way to define the space of automatic tuning, extensibility in terms of all possible TIR schedule primitives like tensorization and loop partitioning, and customizability on every layer of the automation system.
+
+Meta Schedule is our 3rd generation automatic scheduling system.
+
+## 2. Motivation
+
+**Scheduling and Design Space**
+
+In TVM TensorIR, optimization of a TensorIR program is done via a sequence of transformations. For example, we reorder loops for better locality and we tensorize for specific hardware intrinsics. The process of invoking such a set of pre-defined transformations is called “**scheduling**”, and each transformation is called a “**schedule primitive**”. These primitives form a domain-specific language (DSL) describing the transformation of TensorIR programs. **Design space** is the set of all possible schedulings with respect to a TensorIR program.
+
+**Problems with the Current Scheduling System**
+
+* **Manual schedule**: Developers optimize their programs by manually invoking schedule primitives, i.e. explore points in the design space with humans in the loop. This can be a tedious and error-prone approach, hence the creation of AutoTVM and AutoScheduler (Ansor).
+* **AutoTVM**: The automation system requires users to define “schedule templates” as the design space for each operator. Therefore, it is inextensible to hundreds of operators.
+* **AutoScheduler (Ansor)**: It automatically generates schedule templates as the design space, according to a set of predefined “search rules”. However, it is non-trivial to extend AutoScheduler to new schedule primitives (tensorize, loop partition, software pipelining).
+* The three systems above have isolated sets of APIs with several layers of their own abstraction, which are not only hard to learn, but also engineering-intensive to customize.
+
+**Benefit of Meta Schedule**
+
+* Succinct syntax, consistent APIs to TensorIR schedule with no other layer of abstraction.
+* Provides unified APIs for implementing manual schedule, AutoTVM and AutoScheduler (Ansor).
+* Extensibility to all the schedule primitives, including tensorization and loop partitioning. Almost no extra effort is needed to use a new primitive in auto-tuning.
+* The automation infrastructure is customizable across every layer.
+
+## 3. Guide-level explanation
+
+In this section, we describe the syntax of meta schedule DSL, and how it could be used to describe and auto-generate the design space.
+
+### 3.1. Manual Schedule
+
+Meta schedule APIs are almost the same as TE or TensorIR scheduling. Here is an example of a manual schedule for matrix multiplication:
+
+```python
+# Designate a set of tile sizes
+i_tiles = [16, 8, 8, 8]
+j_tiles = [16, 8, 8, 8]
+k_tiles = [256, 8]
+
+# Tile the loops according to the tile sizes
+i_0, i_1, i_2, i_3 = sch.split(loop=i, factors=i_tiles)
+j_0, j_1, j_2, j_3 = sch.split(loop=j, factors=j_tiles)
+k_0, k_1           = sch.split(loop=k, factors=k_tiles)
+
+# Organize the loops into “SSRSRS” 6-level tiles
+sch.reorder(
+    i_0, j_0, # S
+    i_1, j_1, # S
+    k_0,      # R
+    i_2, j_2, # S
+    k_1,      # R
+    i_3, j_3, # S
+)
+```
+
+In this example, the developers may tweak the tile sizes and measure the performance of the generated kernels to explore the opportunities of potential optimization.
+
+Generally speaking, while writing a schedule, there are often some parameters that are hard to determine ahead of time, for example, tile sizes, unroll steps, or which tensor intrinsics to use. Developers may manually enumerate possible combinations of these unknown factors, and then pick the best schedule according to measurement results on their device.
+
+### 3.2. AutoTVM-style Design Space Description
+
+Meta schedule extends the schedule DSL with sampling instructions. When included in a schedule, these instructions parametrize the schedule from a single deterministic point to a space supported by random variables (tile size, etc.), making it possible for developers to describe the design space with meta schedule APIs.
+
+We can extend the matmul example above to cover all possible tilings using these sampling instructions:
+
+```python
+# Sample tile sizes
+i_tiles = sch.sample_perfect_tile(i, n=4)
+j_tiles = sch.sample_perfect_tile(j, n=4)
+k_tiles = sch.sample_perfect_tile(k, n=2)
+# Tile the loops according to the random variables
+i_0, i_1, i_2, i_3 = sch.split(loop=i, factors=i_tiles)
+j_0, j_1, j_2, j_3 = sch.split(loop=j, factors=j_tiles)
+k_0, k_1           = sch.split(loop=k, factors=k_tiles)
+# Organize the loops into “SSRSRS” 6-level tiles
+sch.reorder(
+    i_0, j_0, # S
+    i_1, j_1, # S
+    k_0,      # R
+    i_2, j_2, # S
+    k_1,      # R
+    i_3, j_3, # S
+)
+```
+
+### 3.3. Composite Schedule
+
+Each schedule primitive handles only a very basic operation to transform the IR, for example, `split` only splits a loop into two. In the real world, the over-fine granularity of those primitives usually leads to repetitive and verbose scheduling code, as [mentioned](https://discuss.tvm.apache.org/t/rfc-tensorir-a-schedulable-ir-for-tvm/7872/43?u=junrushao1994) by developers in our community.
+
+To counter this challenge, we allow users to register “composite schedules” that analyze the IR, and apply a set of schedule primitives correspondingly. For instance, a composite schedule may inspect a TensorIR block and decide whether we should call `compute_inline` on it. The composite schedule may use sampling instructions to fill in undecided choices.
+
+Our system also ships with some built-in composite schedules, including:
+
+* Multi-level tiling
+* Inline pure spatial blocks
+* Parallelize & vectorize & unroll
+* Auto tensorize
+* …
+
+### 3.4. AutoScheduler-style Design Space Generation
+
+AutoScheduler (Ansor) generates schedule templates by applying their SearchRules to each stage. Meta schedule treats a search rule as a composite schedule, and applies each composite schedule to each block of TensorIR to generate the design space.
+
+### 3.5. Unifying manual schedule / AutoTVM / Ansor
+
+In this section, we show that the design space induced by TE manual schedule, AutoTVM and Ansor are all subsets of meta schedule, and meta schedule further allows mixing those three styles to search jointly.
+
+**Manual schedule**. The TE schedule is a special case of a meta schedule program, where there is no randomness introduced by sampling instructions. It is a single point in terms of design space.
+
+**AutoTVM (Template-based tuning)**. Writing one or more schedule functions in meta schedule, potentially with sampling instructions, is a natural representation of AutoTVM’s schedule templates (knobs). The PPL generates one or more traces as the design space to explore.
+
+**AutoScheduler (Ansor, Template-free tuning)**. As mentioned in the previous section, application of composite schedule rules generates the design space, which is equivalent to Ansor’s sketch generation.
+
+**Mixing styles in design space definition**. By taking union of the spaces induced by the three special cases, our system allows developers to combine generic rules that Ansor provides and operator-specific scheduling.
+
+## 4. Reference-level explanation
+
+In this section, we introduce the underlying techniques for the automation system to extract and explore the design space.
+
+### 4.1. Execution trace as the design space
+
+**Trace**. To represent the design space defined by the meta schedule DSL, the underlying system records all the instructions users applied to the schedule class, including sampling and schedule primitives. We call this list of instructions a trace.
+
+Executing the example above results in the following trace:
+
+```
+Instruction 0. Sample-Perfect-Tile
+Instruction 1. Sample-Perfect-Tile
+Instruction 2. Sample-Perfect-Tile
+Instruction 3. Split
+Instruction 4. Split
+Instruction 5. Split
+Instruction 6. Reorder
+```
+
+The trace is not directly user-facing, but a data structure inside the user-facing `Schedule` class that records the execution. The automation system extracts the trace and finds out the design space according to the sampling instructions.
+
+**Union of traces**. Often a single trace is unable to represent the entire space. Therefore, more precisely, our system works on a list of traces as the union of potential design space.
+
+**Fork a trace**. When two different decisions in the scheduling process are equally important to generate high-performance schedules, we allow forking the trace into two, and the design space is the union of the forked traces.
+
+### 4.2. Exploring the Search Space
+
+Meta Schedule provides several built-in exploration strategies to exhaustively or efficiently search for efficient schedules.
+
+**Program replay**. A simple strategy that replays the schedule program that generates the PPL, and doesn’t use any advantage provided by the PPL.
+
+**Random search**. Extracts the PPL, and repetitively re-executes the PPL by flipping coins purely randomly.
+
+**Cost-model-guided evolutionary search**. A more efficient exploration strategy. We define two sets of rules:
+
+* Mutator: defines how to jump to a point’s “neighbor” in the design space
+* Postprocessor: sometimes it is non-trivial to statically determine the PPL, for example:
+  * There is a hard requirement in CUDA that the maximum number of threads should not exceed 1024, but it is a random variable that cannot be determined before actually executing the PPL. In this case, we write a postprocessor that errors out when the condition is not satisfied.
+  * The number of outer loops to be fused together depends on their extents, which are random variables. In this case, we annotate the maximum extent allowed on the block, and do actual fusion in a postprocessor.
+
+Our evolutionary search algorithm uses mutators to find possible schedules in the design space, then applies postprocessors, and asks the cost model to predict its performance. After several iterations, the new schedules with the highest scores are finally compiled and measured on device. Epsilon-greedy is used in this process to balance exploitation and exploration.
+
+### 4.3. Python first for flexibility & customizability
+
+We engineer the system in a way that all levels are decoupled and open to customization, aiming at providing a playground for developers to try out new ideas and potentially deliver performance quickly.

Review comment:
       ```suggestion
   We implement the system in a way that all levels are decoupled and open to customization, aiming at providing a playground for developers to try out new ideas and potentially deliver performance quickly.
   ```

##########
File path: rfcs/0001-meta-schedule-autotensorir.md
##########
@@ -0,0 +1,297 @@
+* Feature Name: Meta Schedule (AutoTensorIR)
+* Start Date: 2021-05-28
+* RFC PR: TBD (apache/tvm-rfcs#0000)
+* GitHub Issue: TBD (apache/tvm-rfcs#0000)
+
+## 1. Summary
+
+This proposal introduces Meta Schedule: a probabilistic scheduling DSL on TIR that unifies the approaches of AutoTVM and Auto Scheduler (Ansor). Meta schedule provides a pragmatic way to define the space of automatic tuning, extensibility in terms of all possible TIR schedule primitives like tensorization and loop partitioning, and customizability on every layer of the automation system.
+
+Meta Schedule is our 3rd generation automatic scheduling system.
+
+## 2. Motivation
+
+**Scheduling and Design Space**
+
+In TVM TensorIR, optimization of a TensorIR program is done via a sequence of transformations. For example, we reorder loops for better locality and we tensorize for specific hardware intrinsics. The process of invoking such a set of pre-defined transformations is called “**scheduling**”, and each transformation is called a “**schedule primitive**”. These primitives form a domain-specific language (DSL) describing the transformation of TensorIR programs. **Design space** is the set of all possible schedulings with respect to a TensorIR program.
+
+**Problems with the Current Scheduling System**
+
+* **Manual schedule**: Developers optimize their programs by manually invoking schedule primitives, i.e. explore points in the design space with humans in the loop. This can be a tedious and error-prone approach, hence the creation of AutoTVM and AutoScheduler (Ansor).
+* **AutoTVM**: The automation system requires users to define “schedule templates” as the design space for each operator. Therefore, it is inextensible to hundreds of operators.
+* **AutoScheduler (Ansor)**: It automatically generates schedule templates as the design space, according to a set of predefined “search rules”. However, it is non-trivial to extend AutoScheduler to new schedule primitives (tensorize, loop partition, software pipelining).
+* The three systems above have isolated sets of APIs with several layers of their own abstraction, which are not only hard to learn, but also engineering-intensive to customize.
+
+**Benefit of Meta Schedule**
+
+* Succinct syntax, consistent APIs to TensorIR schedule with no other layer of abstraction.
+* Provides unified APIs for implementing manual schedule, AutoTVM and AutoScheduler (Ansor).
+* Extensibility to all the schedule primitives, including tensorization and loop partitioning. Almost no extra effort is needed to use a new primitive in auto-tuning.
+* The automation infrastructure is customizable across every layer.
+
+## 3. Guide-level explanation
+
+In this section, we describe the syntax of meta schedule DSL, and how it could be used to describe and auto-generate the design space.
+
+### 3.1. Manual Schedule
+
+Meta schedule APIs are almost the same as TE or TensorIR scheduling. Here is an example of a manual schedule for matrix multiplication:
+
+```python
+# Designate a set of tile sizes
+i_tiles = [16, 8, 8, 8]
+j_tiles = [16, 8, 8, 8]
+k_tiles = [256, 8]
+
+# Tile the loops according to the tile sizes
+i_0, i_1, i_2, i_3 = sch.split(loop=i, factors=i_tiles)
+j_0, j_1, j_2, j_3 = sch.split(loop=j, factors=j_tiles)
+k_0, k_1           = sch.split(loop=k, factors=k_tiles)
+
+# Organize the loops into “SSRSRS” 6-level tiles
+sch.reorder(
+    i_0, j_0, # S
+    i_1, j_1, # S
+    k_0,      # R
+    i_2, j_2, # S
+    k_1,      # R
+    i_3, j_3, # S
+)
+```
+
+In this example, the developers may tweak the tile sizes and measure the performance of the generated kernels to explore the opportunities of potential optimization.
+
+Generally speaking, while writing a schedule, there are often some parameters that are hard to determine ahead of time, for example, tile sizes, unroll steps, or which tensor intrinsics to use. Developers may manually enumerate possible combinations of these unknown factors, and then pick the best schedule according to measurement results on their device.
+
+### 3.2. AutoTVM-style Design Space Description
+
+Meta schedule extends the schedule DSL with sampling instructions. When included in a schedule, these instructions parametrize the schedule from a single deterministic point to a space supported by random variables (tile size, etc.), making it possible for developers to describe the design space with meta schedule APIs.
+
+We can extend the matmul example above to cover all possible tilings using these sampling instructions:
+
+```python
+# Sample tile sizes
+i_tiles = sch.sample_perfect_tile(i, n=4)
+j_tiles = sch.sample_perfect_tile(j, n=4)
+k_tiles = sch.sample_perfect_tile(k, n=2)
+# Tile the loops according to the random variables
+i_0, i_1, i_2, i_3 = sch.split(loop=i, factors=i_tiles)
+j_0, j_1, j_2, j_3 = sch.split(loop=j, factors=j_tiles)
+k_0, k_1           = sch.split(loop=k, factors=k_tiles)
+# Organize the loops into “SSRSRS” 6-level tiles
+sch.reorder(
+    i_0, j_0, # S
+    i_1, j_1, # S
+    k_0,      # R
+    i_2, j_2, # S
+    k_1,      # R
+    i_3, j_3, # S
+)
+```
+
+### 3.3. Composite Schedule
+
+Each schedule primitive handles only a very basic operation to transform the IR, for example, `split` only splits a loop into two. In the real world, the over-fine granularity of those primitives usually leads to repetitive and verbose scheduling code, as [mentioned](https://discuss.tvm.apache.org/t/rfc-tensorir-a-schedulable-ir-for-tvm/7872/43?u=junrushao1994) by developers in our community.
+
+To counter this challenge, we allow users to register “composite schedules” that analyze the IR, and apply a set of schedule primitives correspondingly. For instance, a composite schedule may inspect a TensorIR block and decide whether we should call `compute_inline` on it. The composite schedule may use sampling instructions to fill in undecided choices.
+
+Our system also ships with some built-in composite schedules, including:
+
+* Multi-level tiling
+* Inline pure spatial blocks
+* Parallelize & vectorize & unroll
+* Auto tensorize
+* …
+
+### 3.4. AutoScheduler-style Design Space Generation
+
+AutoScheduler (Ansor) generates schedule templates by applying their SearchRules to each stage. Meta schedule treats a search rule as a composite schedule, and applies each composite schedule to each block of TensorIR to generate the design space.
+
+### 3.5. Unifying manual schedule / AutoTVM / Ansor
+
+In this section, we show that the design space induced by TE manual schedule, AutoTVM and Ansor are all subsets of meta schedule, and meta schedule further allows mixing those three styles to search jointly.
+
+**Manual schedule**. The TE schedule is a special case of a meta schedule program, where there is no randomness introduced by sampling instructions. It is a single point in terms of design space.
+
+**AutoTVM (Template-based tuning)**. Writing one or more schedule functions in meta schedule, potentially with sampling instructions, is a natural representation of AutoTVM’s schedule templates (knobs). The PPL generates one or more traces as the design space to explore.
+
+**AutoScheduler (Ansor, Template-free tuning)**. As mentioned in the previous section, application of composite schedule rules generates the design space, which is equivalent to Ansor’s sketch generation.
+
+**Mixing styles in design space definition**. By taking union of the spaces induced by the three special cases, our system allows developers to combine generic rules that Ansor provides and operator-specific scheduling.
+
+## 4. Reference-level explanation
+
+In this section, we introduce the underlying techniques for the automation system to extract and explore the design space.
+
+### 4.1. Execution trace as the design space
+
+**Trace**. To represent the design space defined by the meta schedule DSL, the underlying system records all the instructions users applied to the schedule class, including sampling and schedule primitives. We call this list of instructions a trace.
+
+Executing the example above results in the following trace:
+
+```
+Instruction 0. Sample-Perfect-Tile
+Instruction 1. Sample-Perfect-Tile
+Instruction 2. Sample-Perfect-Tile
+Instruction 3. Split
+Instruction 4. Split
+Instruction 5. Split
+Instruction 6. Reorder
+```
+
+The trace is not directly user-facing, but a data structure inside the user-facing `Schedule` class that records the execution. The automation system extracts the trace and finds out the design space according to the sampling instructions.
+
+**Union of traces**. Often a single trace is unable to represent the entire space. Therefore, more precisely, our system works on a list of traces as the union of potential design space.
+
+**Fork a trace**. When two different decisions in the scheduling process are equally important to generate high-performance schedules, we allow forking the trace into two, and the design space is the union of the forked traces.
+
+### 4.2. Exploring the Search Space
+
+Meta Schedule provides several built-in exploration strategies to exhaustively or efficiently search for efficient schedules.
+
+**Program replay**. A simple strategy that replays the schedule program that generates the PPL, and doesn’t use any advantage provided by the PPL.

Review comment:
       This seems not a "search" approach but just the way to apply a trace? Also "PPL" is first used here so better to provide the full name.

##########
File path: rfcs/0001-meta-schedule-autotensorir.md
##########
@@ -0,0 +1,297 @@
+* Feature Name: Meta Schedule (AutoTensorIR)
+* Start Date: 2021-05-28
+* RFC PR: TBD (apache/tvm-rfcs#0000)
+* GitHub Issue: TBD (apache/tvm-rfcs#0000)
+
+## 1. Summary
+
+This proposal introduces Meta Schedule: a probabilistic scheduling DSL on TIR that unifies the approaches of AutoTVM and Auto Scheduler (Ansor). Meta schedule provides a pragmatic way to define the space of automatic tuning, extensibility in terms of all possible TIR schedule primitives like tensorization and loop partitioning, and customizability on every layer of the automation system.
+
+Meta Schedule is our 3rd generation automatic scheduling system.
+
+## 2. Motivation
+
+**Scheduling and Design Space**
+
+In TVM TensorIR, optimization of a TensorIR program is done via a sequence of transformations. For example, we reorder loops for better locality and we tensorize for specific hardware intrinsics. The process of invoking such a set of pre-defined transformations is called “**scheduling**”, and each transformation is called a “**schedule primitive**”. These primitives form a domain-specific language (DSL) describing the transformation of TensorIR programs. **Design space** is the set of all possible schedulings with respect to a TensorIR program.
+
+**Problems with the Current Scheduling System**
+
+* **Manual schedule**: Developers optimize their programs by manually invoking schedule primitives, i.e. explore points in the design space with humans in the loop. This can be a tedious and error-prone approach, hence the creation of AutoTVM and AutoScheduler (Ansor).
+* **AutoTVM**: The automation system requires users to define “schedule templates” as the design space for each operator. Therefore, it is inextensible to hundreds of operators.
+* **AutoScheduler (Ansor)**: It automatically generates schedule templates as the design space, according to a set of predefined “search rules”. However, it is non-trivial to extend AutoScheduler to new schedule primitives (tensorize, loop partition, software pipelining).
+* The three systems above have isolated sets of APIs with several layers of their own abstraction, which are not only hard to learn, but also engineering-intensive to customize.
+
+**Benefit of Meta Schedule**
+
+* Succinct syntax, consistent APIs to TensorIR schedule with no other layer of abstraction.
+* Provides unified APIs for implementing manual schedule, AutoTVM and AutoScheduler (Ansor).
+* Extensibility to all the schedule primitives, including tensorization and loop partitioning. Almost no extra effort is needed to use a new primitive in auto-tuning.
+* The automation infrastructure is customizable across every layer.
+
+## 3. Guide-level explanation
+
+In this section, we describe the syntax of meta schedule DSL, and how it could be used to describe and auto-generate the design space.
+
+### 3.1. Manual Schedule
+
+Meta schedule APIs are almost the same as TE or TensorIR scheduling. Here is an example of a manual schedule for matrix multiplication:
+
+```python
+# Designate a set of tile sizes
+i_tiles = [16, 8, 8, 8]
+j_tiles = [16, 8, 8, 8]
+k_tiles = [256, 8]
+
+# Tile the loops according to the tile sizes
+i_0, i_1, i_2, i_3 = sch.split(loop=i, factors=i_tiles)
+j_0, j_1, j_2, j_3 = sch.split(loop=j, factors=j_tiles)
+k_0, k_1           = sch.split(loop=k, factors=k_tiles)
+
+# Organize the loops into “SSRSRS” 6-level tiles
+sch.reorder(
+    i_0, j_0, # S
+    i_1, j_1, # S
+    k_0,      # R
+    i_2, j_2, # S
+    k_1,      # R
+    i_3, j_3, # S
+)
+```
+
+In this example, the developers may tweak the tile sizes and measure the performance of the generated kernels to explore the opportunities of potential optimization.
+
+Generally speaking, while writing a schedule, there are often some parameters that are hard to determine ahead of time, for example, tile sizes, unroll steps, or which tensor intrinsics to use. Developers may manually enumerate possible combinations of these unknown factors, and then pick the best schedule according to measurement results on their device.
+
+### 3.2. AutoTVM-style Design Space Description
+
+Meta schedule extends the schedule DSL with sampling instructions. When included in a schedule, these instructions parametrize the schedule from a single deterministic point to a space supported by random variables (tile size, etc.), making it possible for developers to describe the design space with meta schedule APIs.
+
+We can extend the matmul example above to cover all possible tilings using these sampling instructions:
+
+```python
+# Sample tile sizes
+i_tiles = sch.sample_perfect_tile(i, n=4)
+j_tiles = sch.sample_perfect_tile(j, n=4)
+k_tiles = sch.sample_perfect_tile(k, n=2)
+# Tile the loops according to the random variables
+i_0, i_1, i_2, i_3 = sch.split(loop=i, factors=i_tiles)
+j_0, j_1, j_2, j_3 = sch.split(loop=j, factors=j_tiles)
+k_0, k_1           = sch.split(loop=k, factors=k_tiles)
+# Organize the loops into “SSRSRS” 6-level tiles
+sch.reorder(
+    i_0, j_0, # S
+    i_1, j_1, # S
+    k_0,      # R
+    i_2, j_2, # S
+    k_1,      # R
+    i_3, j_3, # S
+)
+```
+
+### 3.3. Composite Schedule
+
+Each schedule primitive handles only a very basic operation to transform the IR, for example, `split` only splits a loop into two. In the real world, the over-fine granularity of those primitives usually leads to repetitive and verbose scheduling code, as [mentioned](https://discuss.tvm.apache.org/t/rfc-tensorir-a-schedulable-ir-for-tvm/7872/43?u=junrushao1994) by developers in our community.
+
+To counter this challenge, we allow users to register “composite schedules” that analyze the IR, and apply a set of schedule primitives correspondingly. For instance, a composite schedule may inspect a TensorIR block and decide whether we should call `compute_inline` on it. The composite schedule may use sampling instructions to fill in undecided choices.
+
+Our system also ships with some built-in composite schedules, including:
+
+* Multi-level tiling
+* Inline pure spatial blocks
+* Parallelize & vectorize & unroll
+* Auto tensorize
+* …
+
+### 3.4. AutoScheduler-style Design Space Generation
+
+AutoScheduler (Ansor) generates schedule templates by applying their SearchRules to each stage. Meta schedule treats a search rule as a composite schedule, and applies each composite schedule to each block of TensorIR to generate the design space.
+
+### 3.5. Unifying manual schedule / AutoTVM / Ansor
+
+In this section, we show that the design space induced by TE manual schedule, AutoTVM and Ansor are all subsets of meta schedule, and meta schedule further allows mixing those three styles to search jointly.
+
+**Manual schedule**. The TE schedule is a special case of a meta schedule program, where there is no randomness introduced by sampling instructions. It is a single point in terms of design space.
+
+**AutoTVM (Template-based tuning)**. Writing one or more schedule functions in meta schedule, potentially with sampling instructions, is a natural representation of AutoTVM’s schedule templates (knobs). The PPL generates one or more traces as the design space to explore.
+
+**AutoScheduler (Ansor, Template-free tuning)**. As mentioned in the previous section, application of composite schedule rules generates the design space, which is equivalent to Ansor’s sketch generation.

Review comment:
       These are more like a summary of previous subsections. It would be more exciting and clear if this subsection can just provide an example that mixes 3 styles.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tvm-rfcs] junrushao1994 commented on a change in pull request #5: [RFC] Meta Schedule (AutoTIR)

Posted by GitBox <gi...@apache.org>.
junrushao1994 commented on a change in pull request #5:
URL: https://github.com/apache/tvm-rfcs/pull/5#discussion_r661756838



##########
File path: rfcs/0001-meta-schedule-autotensorir.md
##########
@@ -0,0 +1,297 @@
+* Feature Name: Meta Schedule (AutoTensorIR)
+* Start Date: 2021-05-28
+* RFC PR: TBD (apache/tvm-rfcs#0000)
+* GitHub Issue: TBD (apache/tvm-rfcs#0000)
+
+## 1. Summary
+
+This proposal introduces Meta Schedule: a probabilistic scheduling DSL on TIR that unifies the approaches of AutoTVM and Auto Scheduler (Ansor). Meta schedule provides a pragmatic way to define the space of automatic tuning, extensibility in terms of all possible TIR schedule primitives like tensorization and loop partitioning, and customizability on every layer of the automation system.
+
+Meta Schedule is our 3rd generation automatic scheduling system.
+
+## 2. Motivation
+
+**Scheduling and Design Space**
+
+In TVM TensorIR, optimization of a TensorIR program is done via a sequence of transformations. For example, we reorder loops for better locality and we tensorize for specific hardware intrinsics. The process of invoking such a set of pre-defined transformations is called “**scheduling**”, and each transformation is called a “**schedule primitive**”. These primitives form a domain-specific language (DSL) describing the transformation of TensorIR programs. **Design space** is the set of all possible schedulings with respect to a TensorIR program.
+
+**Problems with the Current Scheduling System**
+
+* **Manual schedule**: Developers optimize their programs by manually invoking schedule primitives, i.e. explore points in the design space with humans in the loop. This can be a tedious and error-prone approach, hence the creation of AutoTVM and AutoScheduler (Ansor).
+* **AutoTVM**: The automation system requires users to define “schedule templates” as the design space for each operator. Therefore, it is inextensible to hundreds of operators.
+* **AutoScheduler (Ansor)**: It automatically generates schedule templates as the design space, according to a set of predefined “search rules”. However, it is non-trivial to extend AutoScheduler to new schedule primitives (tensorize, loop partition, software pipelining).
+* The three systems above have isolated sets of APIs with several layers of their own abstraction, which are not only hard to learn, but also engineering-intensive to customize.
+
+**Benefit of Meta Schedule**
+
+* Succinct syntax, consistent APIs to TensorIR schedule with no other layer of abstraction.
+* Provides unified APIs for implementing manual schedule, AutoTVM and AutoScheduler (Ansor).
+* Extensibility to all the schedule primitives, including tensorization and loop partitioning. Almost no extra effort is needed to use a new primitive in auto-tuning.
+* The automation infrastructure is customizable across every layer.
+
+## 3. Guide-level explanation
+
+In this section, we describe the syntax of meta schedule DSL, and how it could be used to describe and auto-generate the design space.
+
+### 3.1. Manual Schedule
+
+Meta schedule APIs are almost the same as TE or TensorIR scheduling. Here is an example of a manual schedule for matrix multiplication:
+
+```python
+# Designate a set of tile sizes
+i_tiles = [16, 8, 8, 8]
+j_tiles = [16, 8, 8, 8]
+k_tiles = [256, 8]
+
+# Tile the loops according to the tile sizes
+i_0, i_1, i_2, i_3 = sch.split(loop=i, factors=i_tiles)
+j_0, j_1, j_2, j_3 = sch.split(loop=j, factors=j_tiles)
+k_0, k_1           = sch.split(loop=k, factors=k_tiles)
+
+# Organize the loops into “SSRSRS” 6-level tiles
+sch.reorder(
+    i_0, j_0, # S
+    i_1, j_1, # S
+    k_0,      # R
+    i_2, j_2, # S
+    k_1,      # R
+    i_3, j_3, # S
+)
+```
+
+In this example, the developers may tweak the tile sizes and measure the performance of the generated kernels to explore the opportunities of potential optimization.
+
+Generally speaking, while writing a schedule, there are often some parameters that are hard to determine ahead of time, for example, tile sizes, unroll steps, or which tensor intrinsics to use. Developers may manually enumerate possible combinations of these unknown factors, and then pick the best schedule according to measurement results on their device.
+
+### 3.2. AutoTVM-style Design Space Description
+
+Meta schedule extends the schedule DSL with sampling instructions. When included in a schedule, these instructions parametrize the schedule from a single deterministic point to a space supported by random variables (tile size, etc.), making it possible for developers to describe the design space with meta schedule APIs.
+
+We can extend the matmul example above to cover all possible tilings using these sampling instructions:
+
+```python
+# Sample tile sizes
+i_tiles = sch.sample_perfect_tile(i, n=4)
+j_tiles = sch.sample_perfect_tile(j, n=4)
+k_tiles = sch.sample_perfect_tile(k, n=2)
+# Tile the loops according to the random variables
+i_0, i_1, i_2, i_3 = sch.split(loop=i, factors=i_tiles)
+j_0, j_1, j_2, j_3 = sch.split(loop=j, factors=j_tiles)
+k_0, k_1           = sch.split(loop=k, factors=k_tiles)
+# Organize the loops into “SSRSRS” 6-level tiles
+sch.reorder(
+    i_0, j_0, # S
+    i_1, j_1, # S
+    k_0,      # R
+    i_2, j_2, # S
+    k_1,      # R
+    i_3, j_3, # S
+)
+```
+
+### 3.3. Composite Schedule
+
+Each schedule primitive handles only a very basic operation to transform the IR, for example, `split` only splits a loop into two. In the real world, the over-fine granularity of those primitives usually leads to repetitive and verbose scheduling code, as [mentioned](https://discuss.tvm.apache.org/t/rfc-tensorir-a-schedulable-ir-for-tvm/7872/43?u=junrushao1994) by developers in our community.
+
+To counter this challenge, we allow users to register “composite schedules” that analyze the IR, and apply a set of schedule primitives correspondingly. For instance, a composite schedule may inspect a TensorIR block and decide whether we should call `compute_inline` on it. The composite schedule may use sampling instructions to fill in undecided choices.
+
+Our system also ships with some built-in composite schedules, including:
+
+* Multi-level tiling
+* Inline pure spatial blocks
+* Parallelize & vectorize & unroll
+* Auto tensorize
+* …
+
+### 3.4. AutoScheduler-style Design Space Generation
+
+AutoScheduler (Ansor) generates schedule templates by applying their SearchRules to each stage. Meta schedule treats a search rule as a composite schedule, and applies each composite schedule to each block of TensorIR to generate the design space.
+
+### 3.5. Unifying manual schedule / AutoTVM / Ansor
+
+In this section, we show that the design space induced by TE manual schedule, AutoTVM and Ansor are all subsets of meta schedule, and meta schedule further allows mixing those three styles to search jointly.
+
+**Manual schedule**. The TE schedule is a special case of a meta schedule program, where there is no randomness introduced by sampling instructions. It is a single point in terms of design space.
+
+**AutoTVM (Template-based tuning)**. Writing one or more schedule functions in meta schedule, potentially with sampling instructions, is a natural representation of AutoTVM’s schedule templates (knobs). The PPL generates one or more traces as the design space to explore.
+
+**AutoScheduler (Ansor, Template-free tuning)**. As mentioned in the previous section, application of composite schedule rules generates the design space, which is equivalent to Ansor’s sketch generation.
+
+**Mixing styles in design space definition**. By taking union of the spaces induced by the three special cases, our system allows developers to combine generic rules that Ansor provides and operator-specific scheduling.
+
+## 4. Reference-level explanation
+
+In this section, we introduce the underlying techniques for the automation system to extract and explore the design space.
+
+### 4.1. Execution trace as the design space
+
+**Trace**. To represent the design space defined by the meta schedule DSL, the underlying system records all the instructions users applied to the schedule class, including sampling and schedule primitives. We call this list of instructions a trace.
+
+Executing the example above results in the following trace:
+
+```
+Instruction 0. Sample-Perfect-Tile
+Instruction 1. Sample-Perfect-Tile
+Instruction 2. Sample-Perfect-Tile
+Instruction 3. Split
+Instruction 4. Split
+Instruction 5. Split
+Instruction 6. Reorder
+```
+
+The trace is not directly user-facing, but a data structure inside the user-facing `Schedule` class that records the execution. The automation system extracts the trace and finds out the design space according to the sampling instructions.
+
+**Union of traces**. Often a single trace is unable to represent the entire space. Therefore, more precisely, our system works on a list of traces as the union of potential design space.
+
+**Fork a trace**. When two different decisions in the scheduling process are equally important to generate high-performance schedules, we allow forking the trace into two, and the design space is the union of the forked traces.
+
+### 4.2. Exploring the Search Space
+
+Meta Schedule provides several built-in exploration strategies to exhaustively or efficiently search for efficient schedules.
+
+**Program replay**. A simple strategy that replays the schedule program that generates the PPL, and doesn’t use any advantage provided by the PPL.
+
+**Random search**. Extracts the PPL, and repetitively re-executes the PPL by flipping coins purely randomly.
+
+**Cost-model-guided evolutionary search**. A more efficient exploration strategy. We define two sets of rules:
+
+* Mutator: defines how to jump to a point’s “neighbor” in the design space
+* Postprocessor: sometimes it is non-trivial to statically determine the PPL, for example:
+  * There is a hard requirement in CUDA that the maximum number of threads should not exceed 1024, but it is a random variable that cannot be determined before actually executing the PPL. In this case, we write a postprocessor that errors out when the condition is not satisfied.
+  * The number of outer loops to be fused together depends on their extents, which are random variables. In this case, we annotate the maximum extent allowed on the block, and do actual fusion in a postprocessor.
+
+Our evolutionary search algorithm uses mutators to find possible schedules in the design space, then applies postprocessors, and asks the cost model to predict its performance. After several iterations, the new schedules with the highest scores are finally compiled and measured on device. Epsilon-greedy is used in this process to balance exploitation and exploration.
+
+### 4.3. Python first for flexibility & customizability
+
+We implement the system in a way that all levels are decoupled and open to customization, aiming at providing a playground for developers to try out new ideas and potentially deliver performance quickly.
+
+While all the important APIs are implemented in C++ for efficiency, every part of the system can be easily switched to customized python implementation. For example,
+
+**Customize design space in python**. Can be a python function that does the schedule
+
+```python
+def schedule_matmul(sch) -> sch:
+    i, j, k = sch.get_loops(sch.get_block(“matmul”))
+    i_tiles = sch.sample_perfect_tile(i, n=4)
+    j_tiles = sch.sample_perfect_tile(j, n=4)
+    k_tiles = sch.sample_perfect_tile(k, n=2)
+    # Tile the loops according to the random variables
+    i_0, i_1, i_2, i_3 = sch.split(loop=i, factors=i_tiles)
+    j_0, j_1, j_2, j_3 = sch.split(loop=j, factors=j_tiles)
+    k_0, k_1 = sch.split(loop=k, factors=k_tiles)
+    # Organize the loops into “SSRSRS” 6-level tiles
+    sch.reorder(
+        i_0, j_0, # S
+        i_1, j_1, # S
+        k_0,      # R
+        i_2, j_2, # S
+        k_1,      # R
+        i_3, j_3, # S
+    )
+    return sch
+```
+
+**Customize composite schedule in python**. We provide two ways to define a composite schedule in python:
+
+Method 1. A simple decorator that converts a python function to a composite schedule
+
+```python
+@tir.as_composite_schedule(name="multi-level-tiling")
+def multi_level_tiling(sch: Schedule, block: BlockRV) -> Union[Schedule, List[Schedule]]:
+    ...
+```
+
+Method 2. Derive from `PyCompositeSchedule`, providing extra functionalities like initialization
+
+```python
+class MultiLevelTiling(PyCompositeSchedule):
+    def initialize(...):
+        ...
+    def apply(...):
+        ...
+```
+
+**Customize exploration strategies in python**. Developers can implement any search algorithm in python as well by deriving from `PySearchPolicy`.
+
+**Other customizable components**. This list includes:
+
+* Cost model
+* Database
+* Measure callbacks
+* Feature extractor
+* Program builder & runner
+* Analysis methods
+* ...
+
+In a short summary, almost every component of the system is decoupled with each other and extensions could be easily plugged in.
+
+### 4.4. Upstreaming Plan
+
+[M3a] Core infrastructure of the PPL

Review comment:
       We have a tracking issue of TensorIR here: https://github.com/apache/tvm/issues/7527
   
   Given the two projects are tightly coupled, here we follow TensorIR's M1a/M2a to name those following steps




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tvm-rfcs] junrushao1994 commented on a change in pull request #5: [RFC] Meta Schedule (AutoTIR)

Posted by GitBox <gi...@apache.org>.
junrushao1994 commented on a change in pull request #5:
URL: https://github.com/apache/tvm-rfcs/pull/5#discussion_r668525299



##########
File path: rfcs/0001-meta-schedule-autotensorir.md
##########
@@ -0,0 +1,444 @@
+<!--- Licensed to the Apache Software Foundation (ASF) under one -->
+<!--- or more contributor license agreements.  See the NOTICE file -->
+<!--- distributed with this work for additional information -->
+<!--- regarding copyright ownership.  The ASF licenses this file -->
+<!--- to you under the Apache License, Version 2.0 (the -->
+<!--- "License"); you may not use this file except in compliance -->
+<!--- with the License.  You may obtain a copy of the License at -->
+
+<!---   http://www.apache.org/licenses/LICENSE-2.0 -->
+
+<!--- Unless required by applicable law or agreed to in writing, -->
+<!--- software distributed under the License is distributed on an -->
+<!--- "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY -->
+<!--- KIND, either express or implied.  See the License for the -->
+<!--- specific language governing permissions and limitations -->
+<!--- under the License. -->
+
+* Feature Name: Meta Schedule (Formerly AutoTIR)
+* Start Date: 2021-05-28
+* RFC PR: TBD (apache/tvm-rfcs#0000)
+* GitHub Issue: TBD (apache/tvm-rfcs#0000)
+
+## 1. Summary
+
+This proposal introduces Meta Schedule: a probabilistic scheduling DSL on TIR that unifies the
+approaches of AutoTVM and Auto Scheduler (Ansor). Meta schedule provides a pragmatic way to define
+the space of automatic tuning, extensibility in terms of all possible TIR schedule primitives like
+tensorization and loop partitioning, and customizability on every layer of the automation system.
+
+Meta Schedule is our 3rd generation automatic scheduling system.
+
+## 2. Motivation
+
+**Scheduling and design space.** In TVM TensorIR, optimization of a TensorIR program is done via a
+sequence of transformations. For example, we reorder loops for better locality and we tensorize for
+specific hardware intrinsics. The process of invoking such a set of pre-defined transformations is
+called "**scheduling**", and each transformation is called a "**schedule primitive**". These
+primitives form a domain-specific language (DSL) describing the transformation of TensorIR programs.
+**Design space** is the set of all possible schedulings with respect to a TensorIR program.
+
+**Problems with the current scheduling system.** Currently we have 3 sets of scheduling APIs:
+* **Manual schedule**: Developers optimize their programs by manually invoking schedule primitives,
+  i.e. explore points in the design space with humans in the loop. This can be a tedious and
+  error-prone approach, hence the creation of AutoTVM and AutoScheduler (Ansor).
+* **AutoTVM**: The automation system requires users to define "schedule templates" as the design
+  space for each operator. Therefore, it is inextensible to hundreds of operators and variety
+  hardware platforms.
+* **AutoScheduler (Ansor)**: It automatically generates schedule templates as the design space,
+  according to a set of predefined "search rules". However, it is non-trivial to extend
+  AutoScheduler to new schedule primitives (tensorize, loop partition, software pipelining, etc).
+* The three systems above have isolated sets of APIs with several layers of their own abstraction,
+  which are not only hard to learn, but also engineering-intensive to customize.
+
+**Benefits of Meta Schedule.**  Meta schedule provides:
+* Succinct syntax, consistent APIs to TensorIR schedule with no other layer of abstraction.
+* Unified APIs for implementing manual schedules, AutoTVM-style schedules, and AutoScheduler-style
+  schedules.
+* Extensibility of all the schedule primitives, including tensorization and loop partitioning.
+  Almost no extra effort is needed to use a new primitive in auto-tuning.
+* The automation infrastructure is extensible on every of its components. Every component of the
+  system can be customized easily in pure python or C++ or both. For example, one can develop a new
+  design space generator in python, a new ProgramRunner in python, etc.
+
+
+## 3. Guide-level explanation
+
+In this section, we describe the syntax of meta schedule DSL, and how it could be used to describe
+and auto-generate the design space.
+
+### 3.1. Manual Schedule
+
+Meta schedule APIs are almost the same as TE or TensorIR scheduling. Here is an example of a manual
+schedule for matrix multiplication:
+
+```python
+# Designate a set of tile sizes
+i_tiles = [16, 8, 8, 8]
+j_tiles = [16, 8, 8, 8]
+k_tiles = [256, 8]
+
+# Tile the loops according to the tile sizes
+i_0, i_1, i_2, i_3 = sch.split(loop=i, factors=i_tiles)
+j_0, j_1, j_2, j_3 = sch.split(loop=j, factors=j_tiles)
+k_0, k_1           = sch.split(loop=k, factors=k_tiles)
+
+# Organize the loops into "SSRSRS" 6-level tiles
+sch.reorder(
+    i_0, j_0, # S: the 1st spatial tile
+    i_1, j_1, # S: the 2nd spatial tile
+    k_0,      # R: the 1st reduction tile
+    i_2, j_2, # S: the 3rd spatial tile
+    k_1,      # R: the 2nd reduction tile
+    i_3, j_3, # S: the 4th spatial tile
+)
+```
+
+In this example, the developers may tweak the tile sizes and measure the performance of the
+generated kernels to explore the opportunities of potential optimization.
+
+Generally speaking, while writing a schedule, there are often some parameters that are hard to
+determine ahead of time, for example, tile sizes, unroll steps, or which tensor intrinsics to use.
+Developers may manually enumerate possible combinations of these unknown factors, and then pick the
+best schedule according to measurement results on their device.
+
+### 3.2. Composite Schedule Rules
+
+As introduced in the previous section, in TensorIR, each schedule primitive handles only a very
+basic transformation of the IR. For example, `split` only splits a loop into two new loops. In the
+real world, the over-fine granularity of those primitives usually leads to repetitive and verbose
+scheduling code, as
+[mentioned](https://discuss.tvm.apache.org/t/rfc-tensorir-a-schedulable-ir-for-tvm/7872/43?u=junrushao1994)
+by developers in our community.
+
+To make it more convenient and modular, we allow users to register "composite schedules" that apply
+a sequence of schedule primitives according to certain analysis of the IR. For instance, a composite
+schedule may inspect a TensorIR block and decide whether we should call `compute_inline` on it.
+
+### 3.3. AutoTVM-style Design Space Description
+
+Meta schedule extends the schedule DSL with sampling instructions. When included in a schedule,
+these instructions parametrize the schedule from a single deterministic point to a space supported
+by random variables (tile size, etc.), making it possible for developers to describe the design
+space with meta schedule APIs.
+
+We can extend the matmul example above to cover all possible tilings using these sampling
+instructions:
+
+```python
+# Sample tile sizes
+i_tiles = sch.sample_perfect_tile(i, n=4)
+j_tiles = sch.sample_perfect_tile(j, n=4)
+k_tiles = sch.sample_perfect_tile(k, n=2)
+# Tile the loops according to the random variables
+i_0, i_1, i_2, i_3 = sch.split(loop=i, factors=i_tiles)
+j_0, j_1, j_2, j_3 = sch.split(loop=j, factors=j_tiles)
+k_0, k_1           = sch.split(loop=k, factors=k_tiles)
+# Organize the loops into "SSRSRS" 6-level tiles
+sch.reorder(
+    i_0, j_0, # S: the 1st spatial tile
+    i_1, j_1, # S: the 2nd spatial tile
+    k_0,      # R: the 1st reduction tile
+    i_2, j_2, # S: the 3rd spatial tile
+    k_1,      # R: the 2nd reduction tile
+    i_3, j_3, # S: the 4th spatial tile
+)
+```
+
+### 3.4. AutoScheduler-style Design Space Generation
+
+AutoScheduler (Ansor) generates schedule templates by applying their SearchRules to each stage.
+SearchRule analyzes TE and eagerly trigger schedule primitives accordingly in its internally
+maintained mini IR.
+
+As introduced in Section 3.2, composite schedule rules are equivalent to AutoScheduler's SearchRule
+in TensorIR scheduling. To further generate a design space for scheduling, sampling instructions are
+used in composite schedule rules. Similarly, the sketch generation phase in AutoScheduler is
+equivalent to applying composite schedule rules to each block in TensorIR.
+
+Several built-in composite schedule rules are shipped with our system to align with Ansor's design
+space:
+
+* Multi-level tiling
+* Inline pure spatial blocks
+* Parallelize & vectorize & unroll
+* Auto tensorize
+
+Developers may implement their own rules in either Python or C++. They may specify which rules to
+use with the following syntax:
+
+```python
+from tvm import meta_schedule as ms
+
+design_space_generator = ms.PostOrderApply(rules=[
+    ms.MultiLevelTiling(...),
+    CustomRule(...),
+    ms.OtherBuiltinRules(...),
+])
+
+```
+
+### 3.5. Unifying manual schedule / AutoTVM / Ansor
+
+In this section, we show that the design space induced by TE manual schedule, AutoTVM and Ansor are
+all subsets of meta schedule, and meta schedule further allows mixing those three styles to search
+jointly.
+
+**Manual schedule**. The TE schedule is a special case of a meta schedule program, where there is no
+randomness introduced by sampling instructions. It is a single point in terms of design space.
+
+**AutoTVM (Template-based tuning)**. It is more natural representation of AutoTVM’s schedule
+templates (knobs) by writing one or more schedule functions in meta schedule with sampling
+instructions. The execution traces generated by the schedule functions are the design space to be
+explored.
+
+**AutoScheduler (Ansor, Template-free tuning)**. As mentioned in the previous section, application
+of composite schedule rules generates the design space, which is equivalent to Ansor’s sketch
+generation.
+
+**Mixing styles in design space definition**. By taking union of the spaces induced by the three
+special cases, our system allows developers to combine generic rules that Ansor provides and
+operator-specific scheduling.
+
+## 4. Reference-level explanation
+
+In this section, we introduce the underlying techniques for the automation system to extract and
+explore the design space. The figure below briefly illustrates the workflow of the system:
+
+![meta-schedule-workflow](../resources/meta-schedule-workflow.png)
+
+### 4.1. Execution trace as the design space
+
+**Trace**. To represent the design space defined by the meta schedule DSL, the underlying system
+records all the instructions users applied to the schedule class, including sampling and schedule
+primitives. We call this list of instructions a trace.
+
+For instance, executing the example above results in the following trace:
+
+```
+Instruction 0. Sample-Perfect-Tile
+Instruction 1. Sample-Perfect-Tile
+Instruction 2. Sample-Perfect-Tile
+Instruction 3. Split
+Instruction 4. Split
+Instruction 5. Split
+Instruction 6. Reorder
+```
+
+**Trace forms design space.** A trace may contain zero or more sampling instructions, which
+introduces the uncertainty in scheduling - one instance of sampling results in one point in the
+design space. Therefore, the trace itself forms a design space to explore, e.g. which set of tile
+sizes works best on a specific hardware.
+
+**Union of design space**. Our system works on a set of traces, representing the union of the design
+spaces represented by every single trace.
+
+**Fork a trace**. When two different decisions in the scheduling process are equally important to
+generate high-performance schedules, we allow forking the trace into two, and the design space is
+the union of the forked traces.
+
+The trace is not strictly user-facing, but can be accessed and printed with the following syntax:
+
+```python
+# requires to trace the execution
+sch = tir.Schedule(..., traced=True)
+# do a lot of scheduling
+...
+# print the trace
+print(sch.trace)
+```
+
+And below is an example of the printed trace, which honestly reflects the schedule as a snippet of
+python scheduling function:
+
+```python
+b0 = sch.get_block(name="matmul", func_name="main")
+l1, l2, l3 = sch.get_loops(block=b0)
+v4, v5, v6, v7 = sch.sample_perfect_tile(loop=l1, n=4, max_innermost_factor=16, decision=[32, 1, 16, 2])
+v8, v9, v10, v11 = sch.sample_perfect_tile(loop=l2, n=4, max_innermost_factor=16, decision=[64, 4, 2, 2])
+v12, v13 = sch.sample_perfect_tile(loop=l3, n=2, max_innermost_factor=16, decision=[64, 16])
+l14, l15, l16, l17 = sch.split(loop=l1, factors=[v4, v5, v6, v7])
+l18, l19, l20, l21 = sch.split(loop=l2, factors=[v8, v9, v10, v11])
+l22, l23 = sch.split(loop=l3, factors=[v12, v13])
+sch.reorder(l14, l18, l15, l19, l22, l16, l20, l23, l17, l21)
+```
+
+### 4.2. Exploring the Design Space
+
+Meta Schedule provides several built-in exploration strategies to exhaustively or efficiently search
+for efficient schedules.
+
+**Random search by replaying schedule functions.** With a user-provided schedule function
+as a black-box design space generator, our system could repetitively invoke such an opaque function
+without doing any extra analysis. The function could be written in C++ or Python, or any language
+that implements packed function FFI. If sampling instructions are present in the function, then each
+invocation results in a different IRModule after being scheduled because the random decisions are
+possibly changed across different runs. Effectively, it is equivalent to
+random exploration without trace, allowing the flexibility for users to define arbitrary functions
+that trace may not well support (e.g. control flow divergence based on the value of intermediate
+random variables), but it forbids future opportunity of any trace-based analysis.
+
+**Random search by replaying traces.** Traces are obtained from a design space generator, and
+replayed with a builtin interpreter in our system. If sampling instructions are present on the
+traces, then their random decisions are mutated during each replay, i.e. jumps to a new point in the
+design space. Therefore, repetitive replay of those traces are equivalent to exploration of the
+design space. Our system could potentially benefit from trace-based analysis, including rejecting
+obviously invalid schedules (e.g. using too much CUDA resources), doing dead-code elimination to
+simplify a trace, extracting trace-based features used in the cost model, etc.
+
+**Cost-model-guided evolutionary search**. A more efficient exploration strategy. We define two sets
+of rules:
+
+* Mutator: defines how to jump to a point’s "neighbor" in the design space
+* Postprocessor: after the trace is executed, there are some extra rules we want to execute, for
+  example:
+  * Check CUDA resource limits: There is a hard requirement in CUDA that the maximum number of
+    threads should not exceed 1024, but it is a random variable that cannot be determined before
+    actually executing the trace. In this case, we write a postprocessor that errors out when the
+    condition is not satisfied.
+  * Fuse outer loops until the extent of the fused loops is large enough: The number of outer loops
+    to be fused together depends on their extents, which are random variables. In this case, we
+    annotate the maximum extent allowed on the block, and do actual fusion in a postprocessor.
+
+Our evolutionary search algorithm uses mutators to find possible schedules in the design space, then

Review comment:
       It is a very complicated process, but almost identical to Ansor. I dropped a citation to the Section 5.1 in Ansor's paper for more details




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tvm-rfcs] junrushao1994 commented on a change in pull request #5: [RFC] Meta Schedule (AutoTIR)

Posted by GitBox <gi...@apache.org>.
junrushao1994 commented on a change in pull request #5:
URL: https://github.com/apache/tvm-rfcs/pull/5#discussion_r668506959



##########
File path: rfcs/0001-meta-schedule-autotensorir.md
##########
@@ -0,0 +1,444 @@
+<!--- Licensed to the Apache Software Foundation (ASF) under one -->
+<!--- or more contributor license agreements.  See the NOTICE file -->
+<!--- distributed with this work for additional information -->
+<!--- regarding copyright ownership.  The ASF licenses this file -->
+<!--- to you under the Apache License, Version 2.0 (the -->
+<!--- "License"); you may not use this file except in compliance -->
+<!--- with the License.  You may obtain a copy of the License at -->
+
+<!---   http://www.apache.org/licenses/LICENSE-2.0 -->
+
+<!--- Unless required by applicable law or agreed to in writing, -->
+<!--- software distributed under the License is distributed on an -->
+<!--- "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY -->
+<!--- KIND, either express or implied.  See the License for the -->
+<!--- specific language governing permissions and limitations -->
+<!--- under the License. -->
+
+* Feature Name: Meta Schedule (Formerly AutoTIR)
+* Start Date: 2021-05-28
+* RFC PR: TBD (apache/tvm-rfcs#0000)
+* GitHub Issue: TBD (apache/tvm-rfcs#0000)
+
+## 1. Summary
+
+This proposal introduces Meta Schedule: a probabilistic scheduling DSL on TIR that unifies the
+approaches of AutoTVM and Auto Scheduler (Ansor). Meta schedule provides a pragmatic way to define
+the space of automatic tuning, extensibility in terms of all possible TIR schedule primitives like
+tensorization and loop partitioning, and customizability on every layer of the automation system.
+
+Meta Schedule is our 3rd generation automatic scheduling system.
+
+## 2. Motivation
+
+**Scheduling and design space.** In TVM TensorIR, optimization of a TensorIR program is done via a
+sequence of transformations. For example, we reorder loops for better locality and we tensorize for
+specific hardware intrinsics. The process of invoking such a set of pre-defined transformations is
+called "**scheduling**", and each transformation is called a "**schedule primitive**". These
+primitives form a domain-specific language (DSL) describing the transformation of TensorIR programs.
+**Design space** is the set of all possible schedulings with respect to a TensorIR program.
+
+**Problems with the current scheduling system.** Currently we have 3 sets of scheduling APIs:
+* **Manual schedule**: Developers optimize their programs by manually invoking schedule primitives,
+  i.e. explore points in the design space with humans in the loop. This can be a tedious and
+  error-prone approach, hence the creation of AutoTVM and AutoScheduler (Ansor).
+* **AutoTVM**: The automation system requires users to define "schedule templates" as the design
+  space for each operator. Therefore, it is inextensible to hundreds of operators and variety
+  hardware platforms.
+* **AutoScheduler (Ansor)**: It automatically generates schedule templates as the design space,
+  according to a set of predefined "search rules". However, it is non-trivial to extend
+  AutoScheduler to new schedule primitives (tensorize, loop partition, software pipelining, etc).
+* The three systems above have isolated sets of APIs with several layers of their own abstraction,
+  which are not only hard to learn, but also engineering-intensive to customize.
+
+**Benefits of Meta Schedule.**  Meta schedule provides:
+* Succinct syntax, consistent APIs to TensorIR schedule with no other layer of abstraction.
+* Unified APIs for implementing manual schedules, AutoTVM-style schedules, and AutoScheduler-style
+  schedules.
+* Extensibility of all the schedule primitives, including tensorization and loop partitioning.
+  Almost no extra effort is needed to use a new primitive in auto-tuning.
+* The automation infrastructure is extensible on every of its components. Every component of the
+  system can be customized easily in pure python or C++ or both. For example, one can develop a new
+  design space generator in python, a new ProgramRunner in python, etc.
+
+
+## 3. Guide-level explanation
+
+In this section, we describe the syntax of meta schedule DSL, and how it could be used to describe
+and auto-generate the design space.
+
+### 3.1. Manual Schedule
+
+Meta schedule APIs are almost the same as TE or TensorIR scheduling. Here is an example of a manual
+schedule for matrix multiplication:
+
+```python
+# Designate a set of tile sizes
+i_tiles = [16, 8, 8, 8]
+j_tiles = [16, 8, 8, 8]
+k_tiles = [256, 8]
+
+# Tile the loops according to the tile sizes
+i_0, i_1, i_2, i_3 = sch.split(loop=i, factors=i_tiles)
+j_0, j_1, j_2, j_3 = sch.split(loop=j, factors=j_tiles)
+k_0, k_1           = sch.split(loop=k, factors=k_tiles)
+
+# Organize the loops into "SSRSRS" 6-level tiles
+sch.reorder(
+    i_0, j_0, # S: the 1st spatial tile
+    i_1, j_1, # S: the 2nd spatial tile
+    k_0,      # R: the 1st reduction tile
+    i_2, j_2, # S: the 3rd spatial tile
+    k_1,      # R: the 2nd reduction tile
+    i_3, j_3, # S: the 4th spatial tile
+)
+```
+
+In this example, the developers may tweak the tile sizes and measure the performance of the
+generated kernels to explore the opportunities of potential optimization.
+
+Generally speaking, while writing a schedule, there are often some parameters that are hard to
+determine ahead of time, for example, tile sizes, unroll steps, or which tensor intrinsics to use.
+Developers may manually enumerate possible combinations of these unknown factors, and then pick the
+best schedule according to measurement results on their device.
+
+### 3.2. Composite Schedule Rules
+
+As introduced in the previous section, in TensorIR, each schedule primitive handles only a very
+basic transformation of the IR. For example, `split` only splits a loop into two new loops. In the
+real world, the over-fine granularity of those primitives usually leads to repetitive and verbose
+scheduling code, as
+[mentioned](https://discuss.tvm.apache.org/t/rfc-tensorir-a-schedulable-ir-for-tvm/7872/43?u=junrushao1994)
+by developers in our community.
+
+To make it more convenient and modular, we allow users to register "composite schedules" that apply
+a sequence of schedule primitives according to certain analysis of the IR. For instance, a composite
+schedule may inspect a TensorIR block and decide whether we should call `compute_inline` on it.
+
+### 3.3. AutoTVM-style Design Space Description
+
+Meta schedule extends the schedule DSL with sampling instructions. When included in a schedule,
+these instructions parametrize the schedule from a single deterministic point to a space supported
+by random variables (tile size, etc.), making it possible for developers to describe the design
+space with meta schedule APIs.
+
+We can extend the matmul example above to cover all possible tilings using these sampling
+instructions:
+
+```python
+# Sample tile sizes
+i_tiles = sch.sample_perfect_tile(i, n=4)
+j_tiles = sch.sample_perfect_tile(j, n=4)
+k_tiles = sch.sample_perfect_tile(k, n=2)
+# Tile the loops according to the random variables
+i_0, i_1, i_2, i_3 = sch.split(loop=i, factors=i_tiles)
+j_0, j_1, j_2, j_3 = sch.split(loop=j, factors=j_tiles)
+k_0, k_1           = sch.split(loop=k, factors=k_tiles)
+# Organize the loops into "SSRSRS" 6-level tiles
+sch.reorder(
+    i_0, j_0, # S: the 1st spatial tile
+    i_1, j_1, # S: the 2nd spatial tile
+    k_0,      # R: the 1st reduction tile
+    i_2, j_2, # S: the 3rd spatial tile
+    k_1,      # R: the 2nd reduction tile
+    i_3, j_3, # S: the 4th spatial tile
+)
+```
+
+### 3.4. AutoScheduler-style Design Space Generation
+
+AutoScheduler (Ansor) generates schedule templates by applying their SearchRules to each stage.
+SearchRule analyzes TE and eagerly trigger schedule primitives accordingly in its internally
+maintained mini IR.
+
+As introduced in Section 3.2, composite schedule rules are equivalent to AutoScheduler's SearchRule
+in TensorIR scheduling. To further generate a design space for scheduling, sampling instructions are
+used in composite schedule rules. Similarly, the sketch generation phase in AutoScheduler is
+equivalent to applying composite schedule rules to each block in TensorIR.
+
+Several built-in composite schedule rules are shipped with our system to align with Ansor's design
+space:
+
+* Multi-level tiling
+* Inline pure spatial blocks
+* Parallelize & vectorize & unroll
+* Auto tensorize
+
+Developers may implement their own rules in either Python or C++. They may specify which rules to
+use with the following syntax:
+
+```python
+from tvm import meta_schedule as ms
+
+design_space_generator = ms.PostOrderApply(rules=[
+    ms.MultiLevelTiling(...),
+    CustomRule(...),
+    ms.OtherBuiltinRules(...),
+])
+
+```
+
+### 3.5. Unifying manual schedule / AutoTVM / Ansor
+
+In this section, we show that the design space induced by TE manual schedule, AutoTVM and Ansor are
+all subsets of meta schedule, and meta schedule further allows mixing those three styles to search
+jointly.
+
+**Manual schedule**. The TE schedule is a special case of a meta schedule program, where there is no
+randomness introduced by sampling instructions. It is a single point in terms of design space.
+
+**AutoTVM (Template-based tuning)**. It is more natural representation of AutoTVM’s schedule
+templates (knobs) by writing one or more schedule functions in meta schedule with sampling
+instructions. The execution traces generated by the schedule functions are the design space to be
+explored.
+
+**AutoScheduler (Ansor, Template-free tuning)**. As mentioned in the previous section, application
+of composite schedule rules generates the design space, which is equivalent to Ansor’s sketch
+generation.
+
+**Mixing styles in design space definition**. By taking union of the spaces induced by the three
+special cases, our system allows developers to combine generic rules that Ansor provides and
+operator-specific scheduling.
+
+## 4. Reference-level explanation
+
+In this section, we introduce the underlying techniques for the automation system to extract and
+explore the design space. The figure below briefly illustrates the workflow of the system:
+
+![meta-schedule-workflow](../resources/meta-schedule-workflow.png)
+
+### 4.1. Execution trace as the design space
+
+**Trace**. To represent the design space defined by the meta schedule DSL, the underlying system
+records all the instructions users applied to the schedule class, including sampling and schedule
+primitives. We call this list of instructions a trace.
+
+For instance, executing the example above results in the following trace:
+
+```
+Instruction 0. Sample-Perfect-Tile
+Instruction 1. Sample-Perfect-Tile
+Instruction 2. Sample-Perfect-Tile
+Instruction 3. Split
+Instruction 4. Split
+Instruction 5. Split
+Instruction 6. Reorder
+```
+
+**Trace forms design space.** A trace may contain zero or more sampling instructions, which
+introduces the uncertainty in scheduling - one instance of sampling results in one point in the
+design space. Therefore, the trace itself forms a design space to explore, e.g. which set of tile
+sizes works best on a specific hardware.
+
+**Union of design space**. Our system works on a set of traces, representing the union of the design
+spaces represented by every single trace.
+
+**Fork a trace**. When two different decisions in the scheduling process are equally important to
+generate high-performance schedules, we allow forking the trace into two, and the design space is
+the union of the forked traces.
+
+The trace is not strictly user-facing, but can be accessed and printed with the following syntax:
+
+```python
+# requires to trace the execution
+sch = tir.Schedule(..., traced=True)
+# do a lot of scheduling
+...
+# print the trace
+print(sch.trace)
+```
+
+And below is an example of the printed trace, which honestly reflects the schedule as a snippet of
+python scheduling function:
+
+```python
+b0 = sch.get_block(name="matmul", func_name="main")
+l1, l2, l3 = sch.get_loops(block=b0)
+v4, v5, v6, v7 = sch.sample_perfect_tile(loop=l1, n=4, max_innermost_factor=16, decision=[32, 1, 16, 2])
+v8, v9, v10, v11 = sch.sample_perfect_tile(loop=l2, n=4, max_innermost_factor=16, decision=[64, 4, 2, 2])
+v12, v13 = sch.sample_perfect_tile(loop=l3, n=2, max_innermost_factor=16, decision=[64, 16])
+l14, l15, l16, l17 = sch.split(loop=l1, factors=[v4, v5, v6, v7])
+l18, l19, l20, l21 = sch.split(loop=l2, factors=[v8, v9, v10, v11])
+l22, l23 = sch.split(loop=l3, factors=[v12, v13])
+sch.reorder(l14, l18, l15, l19, l22, l16, l20, l23, l17, l21)
+```
+
+### 4.2. Exploring the Design Space
+
+Meta Schedule provides several built-in exploration strategies to exhaustively or efficiently search
+for efficient schedules.
+
+**Random search by replaying schedule functions.** With a user-provided schedule function
+as a black-box design space generator, our system could repetitively invoke such an opaque function
+without doing any extra analysis. The function could be written in C++ or Python, or any language
+that implements packed function FFI. If sampling instructions are present in the function, then each
+invocation results in a different IRModule after being scheduled because the random decisions are
+possibly changed across different runs. Effectively, it is equivalent to
+random exploration without trace, allowing the flexibility for users to define arbitrary functions
+that trace may not well support (e.g. control flow divergence based on the value of intermediate
+random variables), but it forbids future opportunity of any trace-based analysis.
+
+**Random search by replaying traces.** Traces are obtained from a design space generator, and
+replayed with a builtin interpreter in our system. If sampling instructions are present on the
+traces, then their random decisions are mutated during each replay, i.e. jumps to a new point in the
+design space. Therefore, repetitive replay of those traces are equivalent to exploration of the
+design space. Our system could potentially benefit from trace-based analysis, including rejecting
+obviously invalid schedules (e.g. using too much CUDA resources), doing dead-code elimination to
+simplify a trace, extracting trace-based features used in the cost model, etc.
+
+**Cost-model-guided evolutionary search**. A more efficient exploration strategy. We define two sets
+of rules:
+
+* Mutator: defines how to jump to a point’s "neighbor" in the design space
+* Postprocessor: after the trace is executed, there are some extra rules we want to execute, for
+  example:
+  * Check CUDA resource limits: There is a hard requirement in CUDA that the maximum number of
+    threads should not exceed 1024, but it is a random variable that cannot be determined before
+    actually executing the trace. In this case, we write a postprocessor that errors out when the
+    condition is not satisfied.
+  * Fuse outer loops until the extent of the fused loops is large enough: The number of outer loops
+    to be fused together depends on their extents, which are random variables. In this case, we
+    annotate the maximum extent allowed on the block, and do actual fusion in a postprocessor.
+
+Our evolutionary search algorithm uses mutators to find possible schedules in the design space, then
+applies postprocessors, and asks the cost model to predict its performance. After several
+iterations, the new schedules with the highest scores are finally compiled and measured on device.
+Epsilon-greedy is used in this process to balance exploitation and exploration.
+
+### 4.3. Python first for flexibility & customizability
+
+We implement the system in a way that all levels are decoupled and open to customization, aiming at
+providing a playground for developers to try out new ideas and potentially deliver performance
+quickly.
+
+While all the important APIs are implemented in C++ for efficiency, every part of the system can be
+easily switched to customized python implementation. For example,
+
+**Customize design space in python**. Can be a python function that does the schedule
+
+```python
+def schedule_matmul(sch) -> sch:
+    i, j, k = sch.get_loops(sch.get_block("matmul"))
+    i_tiles = sch.sample_perfect_tile(i, n=4)
+    j_tiles = sch.sample_perfect_tile(j, n=4)
+    k_tiles = sch.sample_perfect_tile(k, n=2)
+    # Tile the loops according to the random variables
+    i_0, i_1, i_2, i_3 = sch.split(loop=i, factors=i_tiles)
+    j_0, j_1, j_2, j_3 = sch.split(loop=j, factors=j_tiles)
+    k_0, k_1 = sch.split(loop=k, factors=k_tiles)
+    # Organize the loops into "SSRSRS" 6-level tiles
+    sch.reorder(
+        i_0, j_0, # S
+        i_1, j_1, # S
+        k_0,      # R
+        i_2, j_2, # S
+        k_1,      # R
+        i_3, j_3, # S
+    )
+    return sch
+```
+
+**Customize composite schedule in python**. We provide two ways to define a composite schedule in
+python:
+
+Method 1. Derive from `PyCompositeSchedule`, and implement two methods `initialize` and `apply`:
+
+```python
+class MultiLevelTiling(PyCompositeSchedule):
+    def initialize(...):
+        # initialize the class, usually this method is empty
+        ...
+
+    def apply(sch: Schedule, block: BlockRV) -> Union[Schedule, List[Schedule]]:
+        # write any python code, including:
+        # - analyze `block`
+        # - invoke schedule primitives in `sch`
+        # - do debug printing
+        ...
+```
+
+Method 2. A decorator as the syntactic sugar if the `initialize` method is empty, which converts the
+function to the `apply` method.
+
+```python
+@tir.as_composite_schedule(name="multi-level-tiling")
+def multi_level_tiling(sch: Schedule, block: BlockRV) -> Union[Schedule, List[Schedule]]:
+    # write any python code, including:
+    # - analyze `block`
+    # - invoke schedule primitives in `sch`
+    # - do debug printing
+    ...
+```
+
+**Customize exploration strategies in python**. Developers can implement any search algorithm in
+python as well by deriving from `PySearchPolicy`, and the syntax is identical to customizing with
+`PyCompositeSchedule`.
+
+**Other customizable components**. This list includes:
+
+* Cost model
+* Database
+* Measure callbacks
+* Feature extractor
+* Program builder & runner
+* Analysis methods
+* ...
+
+In a short summary, almost every component of the system is decoupled with each other and extensions
+could be easily plugged in.
+
+## 5. Drawbacks
+
+We are not aware of any drawbacks of the proposed system.
+
+## 6. Rationale and alternatives
+
+The system is designed with the principle of minimalism: different from alternative solutions, we do
+not require any change in existing codebase, or extra APIs to learn. It could potentially lower the
+bar of using automation systems. 
+
+Unifying manual scheduling, AutoTVM's semi automatic templates and AutoScheduler's (Ansor's) fully
+automatic sketch generation provides flexible way to balance injection new domain knowledge and
+automation.
+
+Flexibility in customization allows quick try-out on new tasks, new strategies and new hardware
+targets without deep knowledge of the system.
+
+## 7. Prior art
+
+**Tensor Expression (TE)** in TVM is a DSL that decouples compute and schedule, which provides
+convenient ways to handcraft optimized kernels for different hardware targets.
+
+**TensorIR** is the latest generation of TVM’s low-level IR. Its capability of eagerly applying

Review comment:
       We should assume TIR is short for TensorIR in all the materials. Perhaps I should remove "the latest generation" here




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tvm-rfcs] tqchen commented on a change in pull request #5: [RFC] Meta Schedule (AutoTIR)

Posted by GitBox <gi...@apache.org>.
tqchen commented on a change in pull request #5:
URL: https://github.com/apache/tvm-rfcs/pull/5#discussion_r661847689



##########
File path: rfcs/0001-meta-schedule-autotensorir.md
##########
@@ -0,0 +1,367 @@
+<!--- Licensed to the Apache Software Foundation (ASF) under one -->
+<!--- or more contributor license agreements.  See the NOTICE file -->
+<!--- distributed with this work for additional information -->
+<!--- regarding copyright ownership.  The ASF licenses this file -->
+<!--- to you under the Apache License, Version 2.0 (the -->
+<!--- "License"); you may not use this file except in compliance -->
+<!--- with the License.  You may obtain a copy of the License at -->
+
+<!---   http://www.apache.org/licenses/LICENSE-2.0 -->
+
+<!--- Unless required by applicable law or agreed to in writing, -->
+<!--- software distributed under the License is distributed on an -->
+<!--- "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY -->
+<!--- KIND, either express or implied.  See the License for the -->
+<!--- specific language governing permissions and limitations -->
+<!--- under the License. -->
+
+* Feature Name: Meta Schedule (AutoTIR)
+* Start Date: 2021-05-28
+* RFC PR: TBD (apache/tvm-rfcs#0000)
+* GitHub Issue: TBD (apache/tvm-rfcs#0000)
+
+## 1. Summary
+
+This proposal introduces Meta Schedule: a probabilistic scheduling DSL on TIR that unifies the approaches of AutoTVM and Auto Scheduler (Ansor). Meta schedule provides a pragmatic way to define the space of automatic tuning, extensibility in terms of all possible TIR schedule primitives like tensorization and loop partitioning, and customizability on every layer of the automation system.
+
+Meta Schedule is our 3rd generation automatic scheduling system.
+
+## 2. Motivation
+
+**Scheduling and Design Space**
+
+In TVM TensorIR, optimization of a TensorIR program is done via a sequence of transformations. For example, we reorder loops for better locality and we tensorize for specific hardware intrinsics. The process of invoking such a set of pre-defined transformations is called "**scheduling**", and each transformation is called a "**schedule primitive**". These primitives form a domain-specific language (DSL) describing the transformation of TensorIR programs. **Design space** is the set of all possible schedulings with respect to a TensorIR program.
+
+**Problems with the Current Scheduling System**
+
+* **Manual schedule**: Developers optimize their programs by manually invoking schedule primitives, i.e. explore points in the design space with humans in the loop. This can be a tedious and error-prone approach, hence the creation of AutoTVM and AutoScheduler (Ansor).
+* **AutoTVM**: The automation system requires users to define "schedule templates" as the design space for each operator. Therefore, it is inextensible to hundreds of operators and variety hardware platforms.
+* **AutoScheduler (Ansor)**: It automatically generates schedule templates as the design space, according to a set of predefined "search rules". However, it is non-trivial to extend AutoScheduler to new schedule primitives (tensorize, loop partition, software pipelining, etc).
+* The three systems above have isolated sets of APIs with several layers of their own abstraction, which are not only hard to learn, but also engineering-intensive to customize.
+
+**Benefit of Meta Schedule**
+
+* Succinct syntax, consistent APIs to TensorIR schedule with no other layer of abstraction.
+* Provides unified APIs for implementing manual schedule, AutoTVM and AutoScheduler (Ansor).
+* Extensibility to all the schedule primitives, including tensorization and loop partitioning. Almost no extra effort is needed to use a new primitive in auto-tuning.
+* The automation infrastructure is extensible on every of its components. Every component of the system can be customized easily in pure python or C++ or both. For example, one can develop a new design space generator in python, a new ProgramRunner in python, etc.
+
+
+## 3. Guide-level explanation
+
+In this section, we describe the syntax of meta schedule DSL, and how it could be used to describe and auto-generate the design space.
+
+### 3.1. Manual Schedule
+
+Meta schedule APIs are almost the same as TE or TensorIR scheduling. Here is an example of a manual schedule for matrix multiplication:
+
+```python
+# Designate a set of tile sizes
+i_tiles = [16, 8, 8, 8]
+j_tiles = [16, 8, 8, 8]
+k_tiles = [256, 8]
+
+# Tile the loops according to the tile sizes
+i_0, i_1, i_2, i_3 = sch.split(loop=i, factors=i_tiles)
+j_0, j_1, j_2, j_3 = sch.split(loop=j, factors=j_tiles)
+k_0, k_1           = sch.split(loop=k, factors=k_tiles)
+
+# Organize the loops into "SSRSRS" 6-level tiles
+sch.reorder(
+    i_0, j_0, # S: the 1st spatial tile
+    i_1, j_1, # S: the 2nd spatial tile
+    k_0,      # R: the 1st reduction tile
+    i_2, j_2, # S: the 3rd spatial tile
+    k_1,      # R: the 2nd reduction tile
+    i_3, j_3, # S: the 4th spatial tile
+)
+```
+
+In this example, the developers may tweak the tile sizes and measure the performance of the generated kernels to explore the opportunities of potential optimization.
+
+Generally speaking, while writing a schedule, there are often some parameters that are hard to determine ahead of time, for example, tile sizes, unroll steps, or which tensor intrinsics to use. Developers may manually enumerate possible combinations of these unknown factors, and then pick the best schedule according to measurement results on their device.
+
+### 3.2. Composite Schedule Rules
+
+As introduced in the previous section, in TensorIR, each schedule primitive handles only a very basic transformation of the IR, for example, `split` only splits a loop into two. In the real world, the over-fine granularity of those primitives usually leads to repetitive and verbose scheduling code, as [mentioned](https://discuss.tvm.apache.org/t/rfc-tensorir-a-schedulable-ir-for-tvm/7872/43?u=junrushao1994) by developers in our community.
+
+To make it more convenient and modular, we allow users to register "composite schedules" that apply a sequence of schedule primitives according to certain analysis of the IR. For instance, a composite schedule may inspect a TensorIR block and decide whether we should call `compute_inline` on it.
+
+### 3.3. AutoTVM-style Design Space Description
+
+Meta schedule extends the schedule DSL with sampling instructions. When included in a schedule, these instructions parametrize the schedule from a single deterministic point to a space supported by random variables (tile size, etc.), making it possible for developers to describe the design space with meta schedule APIs.
+
+We can extend the matmul example above to cover all possible tilings using these sampling instructions:
+
+```python
+# Sample tile sizes
+i_tiles = sch.sample_perfect_tile(i, n=4)
+j_tiles = sch.sample_perfect_tile(j, n=4)
+k_tiles = sch.sample_perfect_tile(k, n=2)
+# Tile the loops according to the random variables
+i_0, i_1, i_2, i_3 = sch.split(loop=i, factors=i_tiles)
+j_0, j_1, j_2, j_3 = sch.split(loop=j, factors=j_tiles)
+k_0, k_1           = sch.split(loop=k, factors=k_tiles)
+# Organize the loops into "SSRSRS" 6-level tiles
+sch.reorder(
+    i_0, j_0, # S: the 1st spatial tile
+    i_1, j_1, # S: the 2nd spatial tile
+    k_0,      # R: the 1st reduction tile
+    i_2, j_2, # S: the 3rd spatial tile
+    k_1,      # R: the 2nd reduction tile
+    i_3, j_3, # S: the 4th spatial tile
+)
+```
+
+### 3.4. AutoScheduler-style Design Space Generation
+
+AutoScheduler (Ansor) generates schedule templates by applying their SearchRules to each stage. SearchRule analyzes TE and eagerly trigger schedule primitives accordingly in its internally maintained mini IR.
+
+As introduced in Section 3.2, composite schedule rules are equivalent to AutoScheduler's SearchRule in TensorIR scheduling. To further generate a design space for scheduling, sampling instructions are used in composite schedule rules. Similarly, the sketch generation phase in AutoScheduler is equivalent to applying composite schedule rules to each block in TensorIR.
+
+Several built-in composite schedule rules are shipped with our system to align with Ansor's design space:
+
+* Multi-level tiling
+* Inline pure spatial blocks
+* Parallelize & vectorize & unroll
+* Auto tensorize
+* ...
+
+Developers may implement their own rules in either Python or C++, and specify which rules to use with the following syntax:
+
+```python
+from tvm import meta_schedule as ms
+
+design_space_generator = ms.PostOrderApply(rules=[
+    ms.MultiLevelTiling(...),
+    CustomRule(...),
+    ms.OtherBuiltinRules(...),
+])
+
+```
+
+### 3.5. Unifying manual schedule / AutoTVM / Ansor
+
+In this section, we show that the design space induced by TE manual schedule, AutoTVM and Ansor are all subsets of meta schedule, and meta schedule further allows mixing those three styles to search jointly.
+
+**Manual schedule**. The TE schedule is a special case of a meta schedule program, where there is no randomness introduced by sampling instructions. It is a single point in terms of design space.
+
+**AutoTVM (Template-based tuning)**. It is more natural representation of AutoTVM’s schedule templates (knobs) by writing one or more schedule functions in meta schedule with sampling instructions. The execution traces generated by the schedule functions are the design space to be explored.
+
+**AutoScheduler (Ansor, Template-free tuning)**. As mentioned in the previous section, application of composite schedule rules generates the design space, which is equivalent to Ansor’s sketch generation.
+
+**Mixing styles in design space definition**. By taking union of the spaces induced by the three special cases, our system allows developers to combine generic rules that Ansor provides and operator-specific scheduling.
+
+## 4. Reference-level explanation
+
+In this section, we introduce the underlying techniques for the automation system to extract and explore the design space. The figure below briefly illustrates the workflow of the system:
+
+![meta-schedule-workflow](../resources/meta-schedule-workflow.png)
+
+### 4.1. Execution trace as the design space
+
+**Trace**. To represent the design space defined by the meta schedule DSL, the underlying system records all the instructions users applied to the schedule class, including sampling and schedule primitives. We call this list of instructions a trace.
+
+Executing the example above results in the following trace:
+
+```
+Instruction 0. Sample-Perfect-Tile
+Instruction 1. Sample-Perfect-Tile
+Instruction 2. Sample-Perfect-Tile
+Instruction 3. Split
+Instruction 4. Split
+Instruction 5. Split
+Instruction 6. Reorder
+```
+
+**Trace forms design space.** A trace may contain zero or more sampling instructions, which introduces the uncertainty in scheduling - one instance of sampling results in one point in the design space. Therefore, the trace itself forms a design space to explore, e.g. which set of tile sizes works best on a specific hardware.
+
+**Union of design space**. Our system works on a set of traces, representing the union of the design spaces represented by every single trace.
+
+**Fork a trace**. When two different decisions in the scheduling process are equally important to generate high-performance schedules, we allow forking the trace into two, and the design space is the union of the forked traces.
+
+The trace is not strictly user-facing, but can be accessed and printed with the following syntax:
+
+```python
+# requires to trace the execution
+sch = tir.Schedule(..., traced=True)
+# do a lot of scheduling
+...
+# print the trace
+print(sch.trace)
+```
+
+And below is an example of the printed trace, which honestly reflects the schedule as a snippet of python scheduling function:
+
+```python
+b0 = sch.get_block(name="matmul", func_name="main")
+l1, l2, l3 = sch.get_loops(block=b0)
+v4, v5, v6, v7 = sch.sample_perfect_tile(loop=l1, n=4, max_innermost_factor=16, decision=[32, 1, 16, 2])
+v8, v9, v10, v11 = sch.sample_perfect_tile(loop=l2, n=4, max_innermost_factor=16, decision=[64, 4, 2, 2])
+v12, v13 = sch.sample_perfect_tile(loop=l3, n=2, max_innermost_factor=16, decision=[64, 16])
+l14, l15, l16, l17 = sch.split(loop=l1, factors=[v4, v5, v6, v7])
+l18, l19, l20, l21 = sch.split(loop=l2, factors=[v8, v9, v10, v11])
+l22, l23 = sch.split(loop=l3, factors=[v12, v13])
+sch.reorder(l14, l18, l15, l19, l22, l16, l20, l23, l17, l21)
+```
+
+### 4.2. Exploring the Search Space
+
+Meta Schedule provides several built-in exploration strategies to exhaustively or efficiently search for efficient schedules.
+
+**Program replay**. A simple strategy that replays the user-provided schedule function without using or taking any advantage of trace.
+
+**Random search**. Extracts the traces as the design space from any design space generator (e.g. user-provided schedule function, composite schedule rules applied to each block, or any custom space generator), repetitively mutates the random decisions of a random trace and re-executes the traces.
+
+**Cost-model-guided evolutionary search**. A more efficient exploration strategy. We define two sets of rules:
+
+* Mutator: defines how to jump to a point’s "neighbor" in the design space
+* Postprocessor: after the trace is executed, there are some extra rules we want to execute, for example:
+  * Check CUDA resource limits: There is a hard requirement in CUDA that the maximum number of threads should not exceed 1024, but it is a random variable that cannot be determined before actually executing the trace. In this case, we write a postprocessor that errors out when the condition is not satisfied.
+  * Fuse outer loops until the extent of the fused loops is large enough: The number of outer loops to be fused together depends on their extents, which are random variables. In this case, we annotate the maximum extent allowed on the block, and do actual fusion in a postprocessor.
+
+Our evolutionary search algorithm uses mutators to find possible schedules in the design space, then applies postprocessors, and asks the cost model to predict its performance. After several iterations, the new schedules with the highest scores are finally compiled and measured on device. Epsilon-greedy is used in this process to balance exploitation and exploration.
+
+### 4.3. Python first for flexibility & customizability
+
+We implement the system in a way that all levels are decoupled and open to customization, aiming at providing a playground for developers to try out new ideas and potentially deliver performance quickly.
+
+While all the important APIs are implemented in C++ for efficiency, every part of the system can be easily switched to customized python implementation. For example,
+
+**Customize design space in python**. Can be a python function that does the schedule
+
+```python
+def schedule_matmul(sch) -> sch:
+    i, j, k = sch.get_loops(sch.get_block("matmul"))
+    i_tiles = sch.sample_perfect_tile(i, n=4)
+    j_tiles = sch.sample_perfect_tile(j, n=4)
+    k_tiles = sch.sample_perfect_tile(k, n=2)
+    # Tile the loops according to the random variables
+    i_0, i_1, i_2, i_3 = sch.split(loop=i, factors=i_tiles)
+    j_0, j_1, j_2, j_3 = sch.split(loop=j, factors=j_tiles)
+    k_0, k_1 = sch.split(loop=k, factors=k_tiles)
+    # Organize the loops into "SSRSRS" 6-level tiles
+    sch.reorder(
+        i_0, j_0, # S
+        i_1, j_1, # S
+        k_0,      # R
+        i_2, j_2, # S
+        k_1,      # R
+        i_3, j_3, # S
+    )
+    return sch
+```
+
+**Customize composite schedule in python**. We provide two ways to define a composite schedule in python:
+
+Method 1. Derive from `PyCompositeSchedule`, and implement two methods `initialize` and `apply`:
+
+```python
+class MultiLevelTiling(PyCompositeSchedule):
+    def initialize(...):
+        # initialize the class, usually this method is empty
+        ...
+
+    def apply(sch: Schedule, block: BlockRV) -> Union[Schedule, List[Schedule]]:
+        # write any python code, including:
+        # - analyze `block`
+        # - invoke schedule primitives in `sch`
+        # - do debug printing
+        ...
+```
+
+Method 2. A decorator as the syntactic sugar if the `initialize` method is empty, which converts the function to the `apply` method.
+
+```python
+@tir.as_composite_schedule(name="multi-level-tiling")
+def multi_level_tiling(sch: Schedule, block: BlockRV) -> Union[Schedule, List[Schedule]]:
+    # write any python code, including:
+    # - analyze `block`
+    # - invoke schedule primitives in `sch`
+    # - do debug printing
+    ...
+```
+
+**Customize exploration strategies in python**. Developers can implement any search algorithm in python as well by deriving from `PySearchPolicy`, and the syntax is identical to customizing with `PyCompositeSchedule`.
+
+**Other customizable components**. This list includes:
+
+* Cost model
+* Database
+* Measure callbacks
+* Feature extractor
+* Program builder & runner
+* Analysis methods
+* ...
+
+In a short summary, almost every component of the system is decoupled with each other and extensions could be easily plugged in.
+
+### 4.4. Upstreaming Plan

Review comment:
       I feel upstreaming plan should be part of the tracking issue and not part of RFC




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tvm-rfcs] junrushao1994 commented on a change in pull request #5: [RFC] Meta Schedule (AutoTIR)

Posted by GitBox <gi...@apache.org>.
junrushao1994 commented on a change in pull request #5:
URL: https://github.com/apache/tvm-rfcs/pull/5#discussion_r661744635



##########
File path: rfcs/0001-meta-schedule-autotensorir.md
##########
@@ -0,0 +1,297 @@
+* Feature Name: Meta Schedule (AutoTensorIR)

Review comment:
       I thought a bit. Let's only use "AutoTIR" once in the title, and in other places let's be consistent with the name "meta schedule"




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tvm-rfcs] tqchen commented on pull request #5: [RFC] Meta Schedule (AutoTIR)

Posted by GitBox <gi...@apache.org>.
tqchen commented on pull request #5:
URL: https://github.com/apache/tvm-rfcs/pull/5#issuecomment-884512750


   @areusch @junrushao1994 do you mind follow up a bit on this.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tvm-rfcs] junrushao1994 commented on a change in pull request #5: [RFC] Meta Schedule (AutoTIR)

Posted by GitBox <gi...@apache.org>.
junrushao1994 commented on a change in pull request #5:
URL: https://github.com/apache/tvm-rfcs/pull/5#discussion_r661789516



##########
File path: rfcs/0001-meta-schedule-autotensorir.md
##########
@@ -0,0 +1,297 @@
+* Feature Name: Meta Schedule (AutoTensorIR)
+* Start Date: 2021-05-28
+* RFC PR: TBD (apache/tvm-rfcs#0000)
+* GitHub Issue: TBD (apache/tvm-rfcs#0000)
+
+## 1. Summary
+
+This proposal introduces Meta Schedule: a probabilistic scheduling DSL on TIR that unifies the approaches of AutoTVM and Auto Scheduler (Ansor). Meta schedule provides a pragmatic way to define the space of automatic tuning, extensibility in terms of all possible TIR schedule primitives like tensorization and loop partitioning, and customizability on every layer of the automation system.
+
+Meta Schedule is our 3rd generation automatic scheduling system.
+
+## 2. Motivation
+
+**Scheduling and Design Space**
+
+In TVM TensorIR, optimization of a TensorIR program is done via a sequence of transformations. For example, we reorder loops for better locality and we tensorize for specific hardware intrinsics. The process of invoking such a set of pre-defined transformations is called “**scheduling**”, and each transformation is called a “**schedule primitive**”. These primitives form a domain-specific language (DSL) describing the transformation of TensorIR programs. **Design space** is the set of all possible schedulings with respect to a TensorIR program.
+
+**Problems with the Current Scheduling System**
+
+* **Manual schedule**: Developers optimize their programs by manually invoking schedule primitives, i.e. explore points in the design space with humans in the loop. This can be a tedious and error-prone approach, hence the creation of AutoTVM and AutoScheduler (Ansor).
+* **AutoTVM**: The automation system requires users to define “schedule templates” as the design space for each operator. Therefore, it is inextensible to hundreds of operators.
+* **AutoScheduler (Ansor)**: It automatically generates schedule templates as the design space, according to a set of predefined “search rules”. However, it is non-trivial to extend AutoScheduler to new schedule primitives (tensorize, loop partition, software pipelining).
+* The three systems above have isolated sets of APIs with several layers of their own abstraction, which are not only hard to learn, but also engineering-intensive to customize.
+
+**Benefit of Meta Schedule**
+
+* Succinct syntax, consistent APIs to TensorIR schedule with no other layer of abstraction.
+* Provides unified APIs for implementing manual schedule, AutoTVM and AutoScheduler (Ansor).
+* Extensibility to all the schedule primitives, including tensorization and loop partitioning. Almost no extra effort is needed to use a new primitive in auto-tuning.
+* The automation infrastructure is customizable across every layer.
+
+## 3. Guide-level explanation
+
+In this section, we describe the syntax of meta schedule DSL, and how it could be used to describe and auto-generate the design space.
+
+### 3.1. Manual Schedule
+
+Meta schedule APIs are almost the same as TE or TensorIR scheduling. Here is an example of a manual schedule for matrix multiplication:
+
+```python
+# Designate a set of tile sizes
+i_tiles = [16, 8, 8, 8]
+j_tiles = [16, 8, 8, 8]
+k_tiles = [256, 8]
+
+# Tile the loops according to the tile sizes
+i_0, i_1, i_2, i_3 = sch.split(loop=i, factors=i_tiles)
+j_0, j_1, j_2, j_3 = sch.split(loop=j, factors=j_tiles)
+k_0, k_1           = sch.split(loop=k, factors=k_tiles)
+
+# Organize the loops into “SSRSRS” 6-level tiles
+sch.reorder(
+    i_0, j_0, # S
+    i_1, j_1, # S
+    k_0,      # R
+    i_2, j_2, # S
+    k_1,      # R
+    i_3, j_3, # S
+)
+```
+
+In this example, the developers may tweak the tile sizes and measure the performance of the generated kernels to explore the opportunities of potential optimization.
+
+Generally speaking, while writing a schedule, there are often some parameters that are hard to determine ahead of time, for example, tile sizes, unroll steps, or which tensor intrinsics to use. Developers may manually enumerate possible combinations of these unknown factors, and then pick the best schedule according to measurement results on their device.
+
+### 3.2. AutoTVM-style Design Space Description
+
+Meta schedule extends the schedule DSL with sampling instructions. When included in a schedule, these instructions parametrize the schedule from a single deterministic point to a space supported by random variables (tile size, etc.), making it possible for developers to describe the design space with meta schedule APIs.
+
+We can extend the matmul example above to cover all possible tilings using these sampling instructions:
+
+```python
+# Sample tile sizes
+i_tiles = sch.sample_perfect_tile(i, n=4)
+j_tiles = sch.sample_perfect_tile(j, n=4)
+k_tiles = sch.sample_perfect_tile(k, n=2)
+# Tile the loops according to the random variables
+i_0, i_1, i_2, i_3 = sch.split(loop=i, factors=i_tiles)
+j_0, j_1, j_2, j_3 = sch.split(loop=j, factors=j_tiles)
+k_0, k_1           = sch.split(loop=k, factors=k_tiles)
+# Organize the loops into “SSRSRS” 6-level tiles
+sch.reorder(
+    i_0, j_0, # S
+    i_1, j_1, # S
+    k_0,      # R
+    i_2, j_2, # S
+    k_1,      # R
+    i_3, j_3, # S
+)
+```
+
+### 3.3. Composite Schedule
+
+Each schedule primitive handles only a very basic operation to transform the IR, for example, `split` only splits a loop into two. In the real world, the over-fine granularity of those primitives usually leads to repetitive and verbose scheduling code, as [mentioned](https://discuss.tvm.apache.org/t/rfc-tensorir-a-schedulable-ir-for-tvm/7872/43?u=junrushao1994) by developers in our community.
+
+To counter this challenge, we allow users to register “composite schedules” that analyze the IR, and apply a set of schedule primitives correspondingly. For instance, a composite schedule may inspect a TensorIR block and decide whether we should call `compute_inline` on it. The composite schedule may use sampling instructions to fill in undecided choices.
+
+Our system also ships with some built-in composite schedules, including:
+
+* Multi-level tiling
+* Inline pure spatial blocks
+* Parallelize & vectorize & unroll
+* Auto tensorize
+* …
+
+### 3.4. AutoScheduler-style Design Space Generation
+
+AutoScheduler (Ansor) generates schedule templates by applying their SearchRules to each stage. Meta schedule treats a search rule as a composite schedule, and applies each composite schedule to each block of TensorIR to generate the design space.
+
+### 3.5. Unifying manual schedule / AutoTVM / Ansor
+
+In this section, we show that the design space induced by TE manual schedule, AutoTVM and Ansor are all subsets of meta schedule, and meta schedule further allows mixing those three styles to search jointly.
+
+**Manual schedule**. The TE schedule is a special case of a meta schedule program, where there is no randomness introduced by sampling instructions. It is a single point in terms of design space.
+
+**AutoTVM (Template-based tuning)**. Writing one or more schedule functions in meta schedule, potentially with sampling instructions, is a natural representation of AutoTVM’s schedule templates (knobs). The PPL generates one or more traces as the design space to explore.
+
+**AutoScheduler (Ansor, Template-free tuning)**. As mentioned in the previous section, application of composite schedule rules generates the design space, which is equivalent to Ansor’s sketch generation.
+
+**Mixing styles in design space definition**. By taking union of the spaces induced by the three special cases, our system allows developers to combine generic rules that Ansor provides and operator-specific scheduling.
+
+## 4. Reference-level explanation
+
+In this section, we introduce the underlying techniques for the automation system to extract and explore the design space.
+
+### 4.1. Execution trace as the design space
+
+**Trace**. To represent the design space defined by the meta schedule DSL, the underlying system records all the instructions users applied to the schedule class, including sampling and schedule primitives. We call this list of instructions a trace.
+
+Executing the example above results in the following trace:
+
+```
+Instruction 0. Sample-Perfect-Tile
+Instruction 1. Sample-Perfect-Tile
+Instruction 2. Sample-Perfect-Tile
+Instruction 3. Split
+Instruction 4. Split
+Instruction 5. Split
+Instruction 6. Reorder
+```
+
+The trace is not directly user-facing, but a data structure inside the user-facing `Schedule` class that records the execution. The automation system extracts the trace and finds out the design space according to the sampling instructions.
+
+**Union of traces**. Often a single trace is unable to represent the entire space. Therefore, more precisely, our system works on a list of traces as the union of potential design space.
+
+**Fork a trace**. When two different decisions in the scheduling process are equally important to generate high-performance schedules, we allow forking the trace into two, and the design space is the union of the forked traces.
+
+### 4.2. Exploring the Search Space
+
+Meta Schedule provides several built-in exploration strategies to exhaustively or efficiently search for efficient schedules.
+
+**Program replay**. A simple strategy that replays the schedule program that generates the PPL, and doesn’t use any advantage provided by the PPL.
+
+**Random search**. Extracts the PPL, and repetitively re-executes the PPL by flipping coins purely randomly.
+
+**Cost-model-guided evolutionary search**. A more efficient exploration strategy. We define two sets of rules:
+
+* Mutator: defines how to jump to a point’s “neighbor” in the design space
+* Postprocessor: sometimes it is non-trivial to statically determine the PPL, for example:
+  * There is a hard requirement in CUDA that the maximum number of threads should not exceed 1024, but it is a random variable that cannot be determined before actually executing the PPL. In this case, we write a postprocessor that errors out when the condition is not satisfied.
+  * The number of outer loops to be fused together depends on their extents, which are random variables. In this case, we annotate the maximum extent allowed on the block, and do actual fusion in a postprocessor.
+
+Our evolutionary search algorithm uses mutators to find possible schedules in the design space, then applies postprocessors, and asks the cost model to predict its performance. After several iterations, the new schedules with the highest scores are finally compiled and measured on device. Epsilon-greedy is used in this process to balance exploitation and exploration.
+
+### 4.3. Python first for flexibility & customizability
+
+We implement the system in a way that all levels are decoupled and open to customization, aiming at providing a playground for developers to try out new ideas and potentially deliver performance quickly.
+
+While all the important APIs are implemented in C++ for efficiency, every part of the system can be easily switched to customized python implementation. For example,
+
+**Customize design space in python**. Can be a python function that does the schedule
+
+```python
+def schedule_matmul(sch) -> sch:
+    i, j, k = sch.get_loops(sch.get_block(“matmul”))
+    i_tiles = sch.sample_perfect_tile(i, n=4)
+    j_tiles = sch.sample_perfect_tile(j, n=4)
+    k_tiles = sch.sample_perfect_tile(k, n=2)
+    # Tile the loops according to the random variables
+    i_0, i_1, i_2, i_3 = sch.split(loop=i, factors=i_tiles)
+    j_0, j_1, j_2, j_3 = sch.split(loop=j, factors=j_tiles)
+    k_0, k_1 = sch.split(loop=k, factors=k_tiles)
+    # Organize the loops into “SSRSRS” 6-level tiles
+    sch.reorder(
+        i_0, j_0, # S
+        i_1, j_1, # S
+        k_0,      # R
+        i_2, j_2, # S
+        k_1,      # R
+        i_3, j_3, # S
+    )
+    return sch
+```
+
+**Customize composite schedule in python**. We provide two ways to define a composite schedule in python:
+
+Method 1. A simple decorator that converts a python function to a composite schedule
+
+```python
+@tir.as_composite_schedule(name="multi-level-tiling")
+def multi_level_tiling(sch: Schedule, block: BlockRV) -> Union[Schedule, List[Schedule]]:
+    ...
+```
+
+Method 2. Derive from `PyCompositeSchedule`, providing extra functionalities like initialization
+
+```python
+class MultiLevelTiling(PyCompositeSchedule):
+    def initialize(...):
+        ...
+    def apply(...):
+        ...

Review comment:
       The decorator is just syntactic sugar of constructing this class, when the `initialize` method is empty. I will update the doc to clarify




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tvm-rfcs] junrushao1994 commented on a change in pull request #5: [RFC] Meta Schedule (AutoTIR)

Posted by GitBox <gi...@apache.org>.
junrushao1994 commented on a change in pull request #5:
URL: https://github.com/apache/tvm-rfcs/pull/5#discussion_r661844245



##########
File path: rfcs/0001-meta-schedule-autotensorir.md
##########
@@ -0,0 +1,297 @@
+* Feature Name: Meta Schedule (AutoTensorIR)
+* Start Date: 2021-05-28
+* RFC PR: TBD (apache/tvm-rfcs#0000)
+* GitHub Issue: TBD (apache/tvm-rfcs#0000)
+
+## 1. Summary
+
+This proposal introduces Meta Schedule: a probabilistic scheduling DSL on TIR that unifies the approaches of AutoTVM and Auto Scheduler (Ansor). Meta schedule provides a pragmatic way to define the space of automatic tuning, extensibility in terms of all possible TIR schedule primitives like tensorization and loop partitioning, and customizability on every layer of the automation system.
+
+Meta Schedule is our 3rd generation automatic scheduling system.
+
+## 2. Motivation
+
+**Scheduling and Design Space**
+
+In TVM TensorIR, optimization of a TensorIR program is done via a sequence of transformations. For example, we reorder loops for better locality and we tensorize for specific hardware intrinsics. The process of invoking such a set of pre-defined transformations is called “**scheduling**”, and each transformation is called a “**schedule primitive**”. These primitives form a domain-specific language (DSL) describing the transformation of TensorIR programs. **Design space** is the set of all possible schedulings with respect to a TensorIR program.
+
+**Problems with the Current Scheduling System**
+
+* **Manual schedule**: Developers optimize their programs by manually invoking schedule primitives, i.e. explore points in the design space with humans in the loop. This can be a tedious and error-prone approach, hence the creation of AutoTVM and AutoScheduler (Ansor).
+* **AutoTVM**: The automation system requires users to define “schedule templates” as the design space for each operator. Therefore, it is inextensible to hundreds of operators.
+* **AutoScheduler (Ansor)**: It automatically generates schedule templates as the design space, according to a set of predefined “search rules”. However, it is non-trivial to extend AutoScheduler to new schedule primitives (tensorize, loop partition, software pipelining).
+* The three systems above have isolated sets of APIs with several layers of their own abstraction, which are not only hard to learn, but also engineering-intensive to customize.
+
+**Benefit of Meta Schedule**
+
+* Succinct syntax, consistent APIs to TensorIR schedule with no other layer of abstraction.
+* Provides unified APIs for implementing manual schedule, AutoTVM and AutoScheduler (Ansor).
+* Extensibility to all the schedule primitives, including tensorization and loop partitioning. Almost no extra effort is needed to use a new primitive in auto-tuning.
+* The automation infrastructure is customizable across every layer.
+
+## 3. Guide-level explanation
+
+In this section, we describe the syntax of meta schedule DSL, and how it could be used to describe and auto-generate the design space.
+
+### 3.1. Manual Schedule
+
+Meta schedule APIs are almost the same as TE or TensorIR scheduling. Here is an example of a manual schedule for matrix multiplication:
+
+```python
+# Designate a set of tile sizes
+i_tiles = [16, 8, 8, 8]
+j_tiles = [16, 8, 8, 8]
+k_tiles = [256, 8]
+
+# Tile the loops according to the tile sizes
+i_0, i_1, i_2, i_3 = sch.split(loop=i, factors=i_tiles)
+j_0, j_1, j_2, j_3 = sch.split(loop=j, factors=j_tiles)
+k_0, k_1           = sch.split(loop=k, factors=k_tiles)
+
+# Organize the loops into “SSRSRS” 6-level tiles
+sch.reorder(
+    i_0, j_0, # S
+    i_1, j_1, # S
+    k_0,      # R
+    i_2, j_2, # S
+    k_1,      # R
+    i_3, j_3, # S
+)
+```
+
+In this example, the developers may tweak the tile sizes and measure the performance of the generated kernels to explore the opportunities of potential optimization.
+
+Generally speaking, while writing a schedule, there are often some parameters that are hard to determine ahead of time, for example, tile sizes, unroll steps, or which tensor intrinsics to use. Developers may manually enumerate possible combinations of these unknown factors, and then pick the best schedule according to measurement results on their device.
+
+### 3.2. AutoTVM-style Design Space Description
+
+Meta schedule extends the schedule DSL with sampling instructions. When included in a schedule, these instructions parametrize the schedule from a single deterministic point to a space supported by random variables (tile size, etc.), making it possible for developers to describe the design space with meta schedule APIs.
+
+We can extend the matmul example above to cover all possible tilings using these sampling instructions:
+
+```python
+# Sample tile sizes
+i_tiles = sch.sample_perfect_tile(i, n=4)
+j_tiles = sch.sample_perfect_tile(j, n=4)
+k_tiles = sch.sample_perfect_tile(k, n=2)
+# Tile the loops according to the random variables
+i_0, i_1, i_2, i_3 = sch.split(loop=i, factors=i_tiles)
+j_0, j_1, j_2, j_3 = sch.split(loop=j, factors=j_tiles)
+k_0, k_1           = sch.split(loop=k, factors=k_tiles)
+# Organize the loops into “SSRSRS” 6-level tiles
+sch.reorder(
+    i_0, j_0, # S
+    i_1, j_1, # S
+    k_0,      # R
+    i_2, j_2, # S
+    k_1,      # R
+    i_3, j_3, # S
+)
+```
+
+### 3.3. Composite Schedule
+
+Each schedule primitive handles only a very basic operation to transform the IR, for example, `split` only splits a loop into two. In the real world, the over-fine granularity of those primitives usually leads to repetitive and verbose scheduling code, as [mentioned](https://discuss.tvm.apache.org/t/rfc-tensorir-a-schedulable-ir-for-tvm/7872/43?u=junrushao1994) by developers in our community.
+
+To counter this challenge, we allow users to register “composite schedules” that analyze the IR, and apply a set of schedule primitives correspondingly. For instance, a composite schedule may inspect a TensorIR block and decide whether we should call `compute_inline` on it. The composite schedule may use sampling instructions to fill in undecided choices.
+
+Our system also ships with some built-in composite schedules, including:
+
+* Multi-level tiling
+* Inline pure spatial blocks
+* Parallelize & vectorize & unroll
+* Auto tensorize
+* …
+
+### 3.4. AutoScheduler-style Design Space Generation
+
+AutoScheduler (Ansor) generates schedule templates by applying their SearchRules to each stage. Meta schedule treats a search rule as a composite schedule, and applies each composite schedule to each block of TensorIR to generate the design space.
+
+### 3.5. Unifying manual schedule / AutoTVM / Ansor
+
+In this section, we show that the design space induced by TE manual schedule, AutoTVM and Ansor are all subsets of meta schedule, and meta schedule further allows mixing those three styles to search jointly.
+
+**Manual schedule**. The TE schedule is a special case of a meta schedule program, where there is no randomness introduced by sampling instructions. It is a single point in terms of design space.
+
+**AutoTVM (Template-based tuning)**. Writing one or more schedule functions in meta schedule, potentially with sampling instructions, is a natural representation of AutoTVM’s schedule templates (knobs). The PPL generates one or more traces as the design space to explore.
+
+**AutoScheduler (Ansor, Template-free tuning)**. As mentioned in the previous section, application of composite schedule rules generates the design space, which is equivalent to Ansor’s sketch generation.
+
+**Mixing styles in design space definition**. By taking union of the spaces induced by the three special cases, our system allows developers to combine generic rules that Ansor provides and operator-specific scheduling.
+
+## 4. Reference-level explanation

Review comment:
       added an image illustrating the workflow




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tvm-rfcs] junrushao1994 commented on a change in pull request #5: [RFC] Meta Schedule (AutoTIR)

Posted by GitBox <gi...@apache.org>.
junrushao1994 commented on a change in pull request #5:
URL: https://github.com/apache/tvm-rfcs/pull/5#discussion_r668518322



##########
File path: rfcs/0001-meta-schedule-autotensorir.md
##########
@@ -0,0 +1,339 @@
+<!--- Licensed to the Apache Software Foundation (ASF) under one -->
+<!--- or more contributor license agreements.  See the NOTICE file -->
+<!--- distributed with this work for additional information -->
+<!--- regarding copyright ownership.  The ASF licenses this file -->
+<!--- to you under the Apache License, Version 2.0 (the -->
+<!--- "License"); you may not use this file except in compliance -->
+<!--- with the License.  You may obtain a copy of the License at -->
+
+<!---   http://www.apache.org/licenses/LICENSE-2.0 -->
+
+<!--- Unless required by applicable law or agreed to in writing, -->
+<!--- software distributed under the License is distributed on an -->
+<!--- "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY -->
+<!--- KIND, either express or implied.  See the License for the -->
+<!--- specific language governing permissions and limitations -->
+<!--- under the License. -->
+
+* Feature Name: Meta Schedule (AutoTIR)
+* Start Date: 2021-05-28
+* RFC PR: TBD (apache/tvm-rfcs#0000)
+* GitHub Issue: TBD (apache/tvm-rfcs#0000)
+
+## 1. Summary
+
+This proposal introduces Meta Schedule: a probabilistic scheduling DSL on TIR that unifies the approaches of AutoTVM and Auto Scheduler (Ansor). Meta schedule provides a pragmatic way to define the space of automatic tuning, extensibility in terms of all possible TIR schedule primitives like tensorization and loop partitioning, and customizability on every layer of the automation system.
+
+Meta Schedule is our 3rd generation automatic scheduling system.
+
+## 2. Motivation
+
+**Scheduling and Design Space**
+
+In TVM TensorIR, optimization of a TensorIR program is done via a sequence of transformations. For example, we reorder loops for better locality and we tensorize for specific hardware intrinsics. The process of invoking such a set of pre-defined transformations is called "**scheduling**", and each transformation is called a "**schedule primitive**". These primitives form a domain-specific language (DSL) describing the transformation of TensorIR programs. **Design space** is the set of all possible schedulings with respect to a TensorIR program.
+
+**Problems with the Current Scheduling System**
+
+* **Manual schedule**: Developers optimize their programs by manually invoking schedule primitives, i.e. explore points in the design space with humans in the loop. This can be a tedious and error-prone approach, hence the creation of AutoTVM and AutoScheduler (Ansor).
+* **AutoTVM**: The automation system requires users to define "schedule templates" as the design space for each operator. Therefore, it is inextensible to hundreds of operators and variety hardware platforms.
+* **AutoScheduler (Ansor)**: It automatically generates schedule templates as the design space, according to a set of predefined "search rules". However, it is non-trivial to extend AutoScheduler to new schedule primitives (tensorize, loop partition, software pipelining, etc).
+* The three systems above have isolated sets of APIs with several layers of their own abstraction, which are not only hard to learn, but also engineering-intensive to customize.
+
+**Benefit of Meta Schedule**
+
+* Succinct syntax, consistent APIs to TensorIR schedule with no other layer of abstraction.
+* Provides unified APIs for implementing manual schedule, AutoTVM and AutoScheduler (Ansor).
+* Extensibility to all the schedule primitives, including tensorization and loop partitioning. Almost no extra effort is needed to use a new primitive in auto-tuning.
+* The automation infrastructure is extensible on every of its components. Every component of the system can be customized easily in pure python or C++ or both. For example, one can develop a new design space generator in python, a new ProgramRunner in python, etc.
+
+
+## 3. Guide-level explanation
+
+In this section, we describe the syntax of meta schedule DSL, and how it could be used to describe and auto-generate the design space.
+
+### 3.1. Manual Schedule
+
+Meta schedule APIs are almost the same as TE or TensorIR scheduling. Here is an example of a manual schedule for matrix multiplication:
+
+```python
+# Designate a set of tile sizes
+i_tiles = [16, 8, 8, 8]
+j_tiles = [16, 8, 8, 8]
+k_tiles = [256, 8]
+
+# Tile the loops according to the tile sizes
+i_0, i_1, i_2, i_3 = sch.split(loop=i, factors=i_tiles)
+j_0, j_1, j_2, j_3 = sch.split(loop=j, factors=j_tiles)
+k_0, k_1           = sch.split(loop=k, factors=k_tiles)
+
+# Organize the loops into "SSRSRS" 6-level tiles
+sch.reorder(
+    i_0, j_0, # S: the 1st spatial tile
+    i_1, j_1, # S: the 2nd spatial tile
+    k_0,      # R: the 1st reduction tile
+    i_2, j_2, # S: the 3rd spatial tile
+    k_1,      # R: the 2nd reduction tile
+    i_3, j_3, # S: the 4th spatial tile
+)
+```
+
+In this example, the developers may tweak the tile sizes and measure the performance of the generated kernels to explore the opportunities of potential optimization.
+
+Generally speaking, while writing a schedule, there are often some parameters that are hard to determine ahead of time, for example, tile sizes, unroll steps, or which tensor intrinsics to use. Developers may manually enumerate possible combinations of these unknown factors, and then pick the best schedule according to measurement results on their device.
+
+### 3.2. Composite Schedule Rules
+
+As introduced in the previous section, in TensorIR, each schedule primitive handles only a very basic transformation of the IR, for example, `split` only splits a loop into two. In the real world, the over-fine granularity of those primitives usually leads to repetitive and verbose scheduling code, as [mentioned](https://discuss.tvm.apache.org/t/rfc-tensorir-a-schedulable-ir-for-tvm/7872/43?u=junrushao1994) by developers in our community.
+
+To make it more convenient and modular, we allow users to register "composite schedules" that apply a sequence of schedule primitives according to certain analysis of the IR. For instance, a composite schedule may inspect a TensorIR block and decide whether we should call `compute_inline` on it.
+
+### 3.3. AutoTVM-style Design Space Description
+
+Meta schedule extends the schedule DSL with sampling instructions. When included in a schedule, these instructions parametrize the schedule from a single deterministic point to a space supported by random variables (tile size, etc.), making it possible for developers to describe the design space with meta schedule APIs.
+
+We can extend the matmul example above to cover all possible tilings using these sampling instructions:
+
+```python
+# Sample tile sizes
+i_tiles = sch.sample_perfect_tile(i, n=4)
+j_tiles = sch.sample_perfect_tile(j, n=4)
+k_tiles = sch.sample_perfect_tile(k, n=2)
+# Tile the loops according to the random variables
+i_0, i_1, i_2, i_3 = sch.split(loop=i, factors=i_tiles)
+j_0, j_1, j_2, j_3 = sch.split(loop=j, factors=j_tiles)
+k_0, k_1           = sch.split(loop=k, factors=k_tiles)
+# Organize the loops into "SSRSRS" 6-level tiles
+sch.reorder(
+    i_0, j_0, # S: the 1st spatial tile
+    i_1, j_1, # S: the 2nd spatial tile
+    k_0,      # R: the 1st reduction tile
+    i_2, j_2, # S: the 3rd spatial tile
+    k_1,      # R: the 2nd reduction tile
+    i_3, j_3, # S: the 4th spatial tile
+)
+```
+
+### 3.4. AutoScheduler-style Design Space Generation
+
+AutoScheduler (Ansor) generates schedule templates by applying their SearchRules to each stage. SearchRule analyzes TE and eagerly trigger schedule primitives accordingly in its internally maintained mini IR.
+
+As introduced in Section 3.2, composite schedule rules are equivalent to AutoScheduler's SearchRule in TensorIR scheduling. To further generate a design space for scheduling, sampling instructions are used in composite schedule rules. Similarly, the sketch generation phase in AutoScheduler is equivalent to applying composite schedule rules to each block in TensorIR.
+
+Several built-in composite schedule rules are shipped with our system to align with Ansor's design space:
+
+* Multi-level tiling
+* Inline pure spatial blocks
+* Parallelize & vectorize & unroll
+* Auto tensorize
+* ...
+
+Developers may implement their own rules in either Python or C++, and specify which rules to use with the following syntax:
+
+```python
+from tvm import meta_schedule as ms
+
+design_space_generator = ms.PostOrderApply(rules=[
+    ms.MultiLevelTiling(...),
+    CustomRule(...),
+    ms.OtherBuiltinRules(...),
+])
+
+```
+
+### 3.5. Unifying manual schedule / AutoTVM / Ansor
+
+In this section, we show that the design space induced by TE manual schedule, AutoTVM and Ansor are all subsets of meta schedule, and meta schedule further allows mixing those three styles to search jointly.
+
+**Manual schedule**. The TE schedule is a special case of a meta schedule program, where there is no randomness introduced by sampling instructions. It is a single point in terms of design space.
+
+**AutoTVM (Template-based tuning)**. It is more natural representation of AutoTVM’s schedule templates (knobs) by writing one or more schedule functions in meta schedule with sampling instructions. The execution traces generated by the schedule functions are the design space to be explored.
+
+**AutoScheduler (Ansor, Template-free tuning)**. As mentioned in the previous section, application of composite schedule rules generates the design space, which is equivalent to Ansor’s sketch generation.
+
+**Mixing styles in design space definition**. By taking union of the spaces induced by the three special cases, our system allows developers to combine generic rules that Ansor provides and operator-specific scheduling.
+
+## 4. Reference-level explanation
+
+In this section, we introduce the underlying techniques for the automation system to extract and explore the design space. The figure below briefly illustrates the workflow of the system:
+
+![meta-schedule-workflow](../resources/meta-schedule-workflow.png)
+
+### 4.1. Execution trace as the design space
+
+**Trace**. To represent the design space defined by the meta schedule DSL, the underlying system records all the instructions users applied to the schedule class, including sampling and schedule primitives. We call this list of instructions a trace.
+
+Executing the example above results in the following trace:
+
+```
+Instruction 0. Sample-Perfect-Tile
+Instruction 1. Sample-Perfect-Tile
+Instruction 2. Sample-Perfect-Tile
+Instruction 3. Split
+Instruction 4. Split
+Instruction 5. Split
+Instruction 6. Reorder
+```
+
+**Trace forms design space.** A trace may contain zero or more sampling instructions, which introduces the uncertainty in scheduling - one instance of sampling results in one point in the design space. Therefore, the trace itself forms a design space to explore, e.g. which set of tile sizes works best on a specific hardware.
+
+**Union of design space**. Our system works on a set of traces, representing the union of the design spaces represented by every single trace.
+
+**Fork a trace**. When two different decisions in the scheduling process are equally important to generate high-performance schedules, we allow forking the trace into two, and the design space is the union of the forked traces.
+
+The trace is not strictly user-facing, but can be accessed and printed with the following syntax:
+
+```python
+# requires to trace the execution
+sch = tir.Schedule(..., traced=True)
+# do a lot of scheduling
+...
+# print the trace
+print(sch.trace)
+```
+
+And below is an example of the printed trace, which honestly reflects the schedule as a snippet of python scheduling function:
+
+```python
+b0 = sch.get_block(name="matmul", func_name="main")
+l1, l2, l3 = sch.get_loops(block=b0)
+v4, v5, v6, v7 = sch.sample_perfect_tile(loop=l1, n=4, max_innermost_factor=16, decision=[32, 1, 16, 2])
+v8, v9, v10, v11 = sch.sample_perfect_tile(loop=l2, n=4, max_innermost_factor=16, decision=[64, 4, 2, 2])
+v12, v13 = sch.sample_perfect_tile(loop=l3, n=2, max_innermost_factor=16, decision=[64, 16])
+l14, l15, l16, l17 = sch.split(loop=l1, factors=[v4, v5, v6, v7])
+l18, l19, l20, l21 = sch.split(loop=l2, factors=[v8, v9, v10, v11])
+l22, l23 = sch.split(loop=l3, factors=[v12, v13])
+sch.reorder(l14, l18, l15, l19, l22, l16, l20, l23, l17, l21)
+```
+
+### 4.2. Exploring the Search Space
+
+Meta Schedule provides several built-in exploration strategies to exhaustively or efficiently search for efficient schedules.
+
+**Program replay**. A simple strategy that replays the user-provided schedule function without using or taking any advantage of trace.
+
+**Random search**. Extracts the traces as the design space from any design space generator (e.g. user-provided schedule function, composite schedule rules applied to each block, or any custom space generator), repetitively mutates the random decisions of a random trace and re-executes the traces.
+
+**Cost-model-guided evolutionary search**. A more efficient exploration strategy. We define two sets of rules:
+
+* Mutator: defines how to jump to a point’s "neighbor" in the design space
+* Postprocessor: after the trace is executed, there are some extra rules we want to execute, for example:
+  * Check CUDA resource limits: There is a hard requirement in CUDA that the maximum number of threads should not exceed 1024, but it is a random variable that cannot be determined before actually executing the trace. In this case, we write a postprocessor that errors out when the condition is not satisfied.
+  * Fuse outer loops until the extent of the fused loops is large enough: The number of outer loops to be fused together depends on their extents, which are random variables. In this case, we annotate the maximum extent allowed on the block, and do actual fusion in a postprocessor.
+
+Our evolutionary search algorithm uses mutators to find possible schedules in the design space, then applies postprocessors, and asks the cost model to predict its performance. After several iterations, the new schedules with the highest scores are finally compiled and measured on device. Epsilon-greedy is used in this process to balance exploitation and exploration.
+
+### 4.3. Python first for flexibility & customizability
+
+We implement the system in a way that all levels are decoupled and open to customization, aiming at providing a playground for developers to try out new ideas and potentially deliver performance quickly.
+
+While all the important APIs are implemented in C++ for efficiency, every part of the system can be easily switched to customized python implementation. For example,
+
+**Customize design space in python**. Can be a python function that does the schedule
+
+```python
+def schedule_matmul(sch) -> sch:
+    i, j, k = sch.get_loops(sch.get_block("matmul"))
+    i_tiles = sch.sample_perfect_tile(i, n=4)
+    j_tiles = sch.sample_perfect_tile(j, n=4)
+    k_tiles = sch.sample_perfect_tile(k, n=2)
+    # Tile the loops according to the random variables
+    i_0, i_1, i_2, i_3 = sch.split(loop=i, factors=i_tiles)
+    j_0, j_1, j_2, j_3 = sch.split(loop=j, factors=j_tiles)
+    k_0, k_1 = sch.split(loop=k, factors=k_tiles)
+    # Organize the loops into "SSRSRS" 6-level tiles
+    sch.reorder(
+        i_0, j_0, # S
+        i_1, j_1, # S
+        k_0,      # R
+        i_2, j_2, # S
+        k_1,      # R
+        i_3, j_3, # S
+    )
+    return sch
+```
+
+**Customize composite schedule in python**. We provide two ways to define a composite schedule in python:
+
+Method 1. Derive from `PyCompositeSchedule`, and implement two methods `initialize` and `apply`:
+
+```python
+class MultiLevelTiling(PyCompositeSchedule):
+    def initialize(...):
+        # initialize the class, usually this method is empty
+        ...
+
+    def apply(sch: Schedule, block: BlockRV) -> Union[Schedule, List[Schedule]]:
+        # write any python code, including:
+        # - analyze `block`
+        # - invoke schedule primitives in `sch`
+        # - do debug printing
+        ...
+```
+
+Method 2. A decorator as the syntactic sugar if the `initialize` method is empty, which converts the function to the `apply` method.
+
+```python
+@tir.as_composite_schedule(name="multi-level-tiling")
+def multi_level_tiling(sch: Schedule, block: BlockRV) -> Union[Schedule, List[Schedule]]:
+    # write any python code, including:
+    # - analyze `block`
+    # - invoke schedule primitives in `sch`
+    # - do debug printing
+    ...
+```
+
+**Customize exploration strategies in python**. Developers can implement any search algorithm in python as well by deriving from `PySearchPolicy`, and the syntax is identical to customizing with `PyCompositeSchedule`.
+
+**Other customizable components**. This list includes:
+
+* Cost model
+* Database
+* Measure callbacks
+* Feature extractor
+* Program builder & runner
+* Analysis methods
+* ...
+
+In a short summary, almost every component of the system is decoupled with each other and extensions could be easily plugged in.
+
+## 5. Drawbacks
+
+We are not aware of any drawbacks of the proposed system.

Review comment:
       Hey I don't want to be defensive here, but my biased opinion is that it is much simpler compared with AutoTVM and Ansor. AutoTVM requires us to learn some opaque APIs like knobs, and Ansor even designs it internal IR and DSL, but in meta schedule, everything is just a trace :-)
   
   A drawback I could think of is that we will have to rework Relay integration and Relay operator strategy stuff for meta schedule
   
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tvm-rfcs] tkonolige commented on a change in pull request #5: [RFC] Meta Schedule (AutoTensorIR)

Posted by GitBox <gi...@apache.org>.
tkonolige commented on a change in pull request #5:
URL: https://github.com/apache/tvm-rfcs/pull/5#discussion_r647867325



##########
File path: rfcs/0001-meta-schedule-autotensorir.md
##########
@@ -0,0 +1,297 @@
+* Feature Name: Meta Schedule (AutoTensorIR)
+* Start Date: 2021-05-28
+* RFC PR: TBD (apache/tvm-rfcs#0000)
+* GitHub Issue: TBD (apache/tvm-rfcs#0000)
+
+## 1. Summary
+
+This proposal introduces Meta Schedule: a probabilistic scheduling DSL on TIR that unifies the approaches of AutoTVM and Auto Scheduler (Ansor). Meta schedule provides a pragmatic way to define the space of automatic tuning, extensibility in terms of all possible TIR schedule primitives like tensorization and loop partitioning, and customizability on every layer of the automation system.
+
+Meta Schedule is our 3rd generation automatic scheduling system.
+
+## 2. Motivation
+
+**Scheduling and Design Space**
+
+In TVM TensorIR, optimization of a TensorIR program is done via a sequence of transformations. For example, we reorder loops for better locality and we tensorize for specific hardware intrinsics. The process of invoking such a set of pre-defined transformations is called “**scheduling**”, and each transformation is called a “**schedule primitive**”. These primitives form a domain-specific language (DSL) describing the transformation of TensorIR programs. **Design space** is the set of all possible schedulings with respect to a TensorIR program.
+
+**Problems with the Current Scheduling System**
+
+* **Manual schedule**: Developers optimize their programs by manually invoking schedule primitives, i.e. explore points in the design space with humans in the loop. This can be a tedious and error-prone approach, hence the creation of AutoTVM and AutoScheduler (Ansor).
+* **AutoTVM**: The automation system requires users to define “schedule templates” as the design space for each operator. Therefore, it is inextensible to hundreds of operators.
+* **AutoScheduler (Ansor)**: It automatically generates schedule templates as the design space, according to a set of predefined “search rules”. However, it is non-trivial to extend AutoScheduler to new schedule primitives (tensorize, loop partition, software pipelining).
+* The three systems above have isolated sets of APIs with several layers of their own abstraction, which are not only hard to learn, but also engineering-intensive to customize.
+
+**Benefit of Meta Schedule**
+
+* Succinct syntax, consistent APIs to TensorIR schedule with no other layer of abstraction.
+* Provides unified APIs for implementing manual schedule, AutoTVM and AutoScheduler (Ansor).
+* Extensibility to all the schedule primitives, including tensorization and loop partitioning. Almost no extra effort is needed to use a new primitive in auto-tuning.
+* The automation infrastructure is customizable across every layer.
+
+## 3. Guide-level explanation
+
+In this section, we describe the syntax of meta schedule DSL, and how it could be used to describe and auto-generate the design space.
+
+### 3.1. Manual Schedule
+
+Meta schedule APIs are almost the same as TE or TensorIR scheduling. Here is an example of a manual schedule for matrix multiplication:
+
+```python
+# Designate a set of tile sizes
+i_tiles = [16, 8, 8, 8]
+j_tiles = [16, 8, 8, 8]
+k_tiles = [256, 8]
+
+# Tile the loops according to the tile sizes
+i_0, i_1, i_2, i_3 = sch.split(loop=i, factors=i_tiles)
+j_0, j_1, j_2, j_3 = sch.split(loop=j, factors=j_tiles)
+k_0, k_1           = sch.split(loop=k, factors=k_tiles)
+
+# Organize the loops into “SSRSRS” 6-level tiles
+sch.reorder(
+    i_0, j_0, # S
+    i_1, j_1, # S
+    k_0,      # R
+    i_2, j_2, # S
+    k_1,      # R
+    i_3, j_3, # S
+)
+```
+
+In this example, the developers may tweak the tile sizes and measure the performance of the generated kernels to explore the opportunities of potential optimization.
+
+Generally speaking, while writing a schedule, there are often some parameters that are hard to determine ahead of time, for example, tile sizes, unroll steps, or which tensor intrinsics to use. Developers may manually enumerate possible combinations of these unknown factors, and then pick the best schedule according to measurement results on their device.
+
+### 3.2. AutoTVM-style Design Space Description
+
+Meta schedule extends the schedule DSL with sampling instructions. When included in a schedule, these instructions parametrize the schedule from a single deterministic point to a space supported by random variables (tile size, etc.), making it possible for developers to describe the design space with meta schedule APIs.
+
+We can extend the matmul example above to cover all possible tilings using these sampling instructions:
+
+```python
+# Sample tile sizes
+i_tiles = sch.sample_perfect_tile(i, n=4)
+j_tiles = sch.sample_perfect_tile(j, n=4)
+k_tiles = sch.sample_perfect_tile(k, n=2)
+# Tile the loops according to the random variables
+i_0, i_1, i_2, i_3 = sch.split(loop=i, factors=i_tiles)
+j_0, j_1, j_2, j_3 = sch.split(loop=j, factors=j_tiles)
+k_0, k_1           = sch.split(loop=k, factors=k_tiles)
+# Organize the loops into “SSRSRS” 6-level tiles
+sch.reorder(
+    i_0, j_0, # S
+    i_1, j_1, # S
+    k_0,      # R
+    i_2, j_2, # S
+    k_1,      # R
+    i_3, j_3, # S
+)
+```
+
+### 3.3. Composite Schedule
+
+Each schedule primitive handles only a very basic operation to transform the IR, for example, `split` only splits a loop into two. In the real world, the over-fine granularity of those primitives usually leads to repetitive and verbose scheduling code, as [mentioned](https://discuss.tvm.apache.org/t/rfc-tensorir-a-schedulable-ir-for-tvm/7872/43?u=junrushao1994) by developers in our community.
+
+To counter this challenge, we allow users to register “composite schedules” that analyze the IR, and apply a set of schedule primitives correspondingly. For instance, a composite schedule may inspect a TensorIR block and decide whether we should call `compute_inline` on it. The composite schedule may use sampling instructions to fill in undecided choices.
+
+Our system also ships with some built-in composite schedules, including:
+
+* Multi-level tiling
+* Inline pure spatial blocks
+* Parallelize & vectorize & unroll
+* Auto tensorize
+* …
+
+### 3.4. AutoScheduler-style Design Space Generation
+
+AutoScheduler (Ansor) generates schedule templates by applying their SearchRules to each stage. Meta schedule treats a search rule as a composite schedule, and applies each composite schedule to each block of TensorIR to generate the design space.
+
+### 3.5. Unifying manual schedule / AutoTVM / Ansor
+
+In this section, we show that the design space induced by TE manual schedule, AutoTVM and Ansor are all subsets of meta schedule, and meta schedule further allows mixing those three styles to search jointly.
+
+**Manual schedule**. The TE schedule is a special case of a meta schedule program, where there is no randomness introduced by sampling instructions. It is a single point in terms of design space.
+
+**AutoTVM (Template-based tuning)**. Writing one or more schedule functions in meta schedule, potentially with sampling instructions, is a natural representation of AutoTVM’s schedule templates (knobs). The PPL generates one or more traces as the design space to explore.
+
+**AutoScheduler (Ansor, Template-free tuning)**. As mentioned in the previous section, application of composite schedule rules generates the design space, which is equivalent to Ansor’s sketch generation.
+
+**Mixing styles in design space definition**. By taking union of the spaces induced by the three special cases, our system allows developers to combine generic rules that Ansor provides and operator-specific scheduling.
+
+## 4. Reference-level explanation

Review comment:
       I think this section needs a lot more detail. My understanding is that this should be more of a technical section on the architecture of what you are adding. Maybe @hogepodge could clarify?
   
   Personally, I would like to see a high level sketch of the meta schedule architecture. For example, descriptions of the main classes and how they interact.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tvm-rfcs] junrushao1994 commented on a change in pull request #5: [RFC] Meta Schedule (AutoTIR)

Posted by GitBox <gi...@apache.org>.
junrushao1994 commented on a change in pull request #5:
URL: https://github.com/apache/tvm-rfcs/pull/5#discussion_r664905546



##########
File path: rfcs/0001-meta-schedule-autotensorir.md
##########
@@ -0,0 +1,339 @@
+<!--- Licensed to the Apache Software Foundation (ASF) under one -->
+<!--- or more contributor license agreements.  See the NOTICE file -->
+<!--- distributed with this work for additional information -->
+<!--- regarding copyright ownership.  The ASF licenses this file -->
+<!--- to you under the Apache License, Version 2.0 (the -->
+<!--- "License"); you may not use this file except in compliance -->
+<!--- with the License.  You may obtain a copy of the License at -->
+
+<!---   http://www.apache.org/licenses/LICENSE-2.0 -->
+
+<!--- Unless required by applicable law or agreed to in writing, -->
+<!--- software distributed under the License is distributed on an -->
+<!--- "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY -->
+<!--- KIND, either express or implied.  See the License for the -->
+<!--- specific language governing permissions and limitations -->
+<!--- under the License. -->
+
+* Feature Name: Meta Schedule (AutoTIR)
+* Start Date: 2021-05-28
+* RFC PR: TBD (apache/tvm-rfcs#0000)
+* GitHub Issue: TBD (apache/tvm-rfcs#0000)
+
+## 1. Summary
+
+This proposal introduces Meta Schedule: a probabilistic scheduling DSL on TIR that unifies the approaches of AutoTVM and Auto Scheduler (Ansor). Meta schedule provides a pragmatic way to define the space of automatic tuning, extensibility in terms of all possible TIR schedule primitives like tensorization and loop partitioning, and customizability on every layer of the automation system.
+
+Meta Schedule is our 3rd generation automatic scheduling system.
+
+## 2. Motivation
+
+**Scheduling and Design Space**
+
+In TVM TensorIR, optimization of a TensorIR program is done via a sequence of transformations. For example, we reorder loops for better locality and we tensorize for specific hardware intrinsics. The process of invoking such a set of pre-defined transformations is called "**scheduling**", and each transformation is called a "**schedule primitive**". These primitives form a domain-specific language (DSL) describing the transformation of TensorIR programs. **Design space** is the set of all possible schedulings with respect to a TensorIR program.
+
+**Problems with the Current Scheduling System**
+
+* **Manual schedule**: Developers optimize their programs by manually invoking schedule primitives, i.e. explore points in the design space with humans in the loop. This can be a tedious and error-prone approach, hence the creation of AutoTVM and AutoScheduler (Ansor).
+* **AutoTVM**: The automation system requires users to define "schedule templates" as the design space for each operator. Therefore, it is inextensible to hundreds of operators and variety hardware platforms.
+* **AutoScheduler (Ansor)**: It automatically generates schedule templates as the design space, according to a set of predefined "search rules". However, it is non-trivial to extend AutoScheduler to new schedule primitives (tensorize, loop partition, software pipelining, etc).
+* The three systems above have isolated sets of APIs with several layers of their own abstraction, which are not only hard to learn, but also engineering-intensive to customize.
+
+**Benefit of Meta Schedule**
+
+* Succinct syntax, consistent APIs to TensorIR schedule with no other layer of abstraction.
+* Provides unified APIs for implementing manual schedule, AutoTVM and AutoScheduler (Ansor).
+* Extensibility to all the schedule primitives, including tensorization and loop partitioning. Almost no extra effort is needed to use a new primitive in auto-tuning.
+* The automation infrastructure is extensible on every of its components. Every component of the system can be customized easily in pure python or C++ or both. For example, one can develop a new design space generator in python, a new ProgramRunner in python, etc.
+
+
+## 3. Guide-level explanation
+
+In this section, we describe the syntax of meta schedule DSL, and how it could be used to describe and auto-generate the design space.
+
+### 3.1. Manual Schedule
+
+Meta schedule APIs are almost the same as TE or TensorIR scheduling. Here is an example of a manual schedule for matrix multiplication:
+
+```python
+# Designate a set of tile sizes
+i_tiles = [16, 8, 8, 8]
+j_tiles = [16, 8, 8, 8]
+k_tiles = [256, 8]
+
+# Tile the loops according to the tile sizes
+i_0, i_1, i_2, i_3 = sch.split(loop=i, factors=i_tiles)
+j_0, j_1, j_2, j_3 = sch.split(loop=j, factors=j_tiles)
+k_0, k_1           = sch.split(loop=k, factors=k_tiles)
+
+# Organize the loops into "SSRSRS" 6-level tiles
+sch.reorder(
+    i_0, j_0, # S: the 1st spatial tile
+    i_1, j_1, # S: the 2nd spatial tile
+    k_0,      # R: the 1st reduction tile
+    i_2, j_2, # S: the 3rd spatial tile
+    k_1,      # R: the 2nd reduction tile
+    i_3, j_3, # S: the 4th spatial tile
+)
+```
+
+In this example, the developers may tweak the tile sizes and measure the performance of the generated kernels to explore the opportunities of potential optimization.
+
+Generally speaking, while writing a schedule, there are often some parameters that are hard to determine ahead of time, for example, tile sizes, unroll steps, or which tensor intrinsics to use. Developers may manually enumerate possible combinations of these unknown factors, and then pick the best schedule according to measurement results on their device.
+
+### 3.2. Composite Schedule Rules
+
+As introduced in the previous section, in TensorIR, each schedule primitive handles only a very basic transformation of the IR, for example, `split` only splits a loop into two. In the real world, the over-fine granularity of those primitives usually leads to repetitive and verbose scheduling code, as [mentioned](https://discuss.tvm.apache.org/t/rfc-tensorir-a-schedulable-ir-for-tvm/7872/43?u=junrushao1994) by developers in our community.
+
+To make it more convenient and modular, we allow users to register "composite schedules" that apply a sequence of schedule primitives according to certain analysis of the IR. For instance, a composite schedule may inspect a TensorIR block and decide whether we should call `compute_inline` on it.
+
+### 3.3. AutoTVM-style Design Space Description
+
+Meta schedule extends the schedule DSL with sampling instructions. When included in a schedule, these instructions parametrize the schedule from a single deterministic point to a space supported by random variables (tile size, etc.), making it possible for developers to describe the design space with meta schedule APIs.
+
+We can extend the matmul example above to cover all possible tilings using these sampling instructions:
+
+```python
+# Sample tile sizes
+i_tiles = sch.sample_perfect_tile(i, n=4)
+j_tiles = sch.sample_perfect_tile(j, n=4)
+k_tiles = sch.sample_perfect_tile(k, n=2)
+# Tile the loops according to the random variables
+i_0, i_1, i_2, i_3 = sch.split(loop=i, factors=i_tiles)
+j_0, j_1, j_2, j_3 = sch.split(loop=j, factors=j_tiles)
+k_0, k_1           = sch.split(loop=k, factors=k_tiles)
+# Organize the loops into "SSRSRS" 6-level tiles
+sch.reorder(
+    i_0, j_0, # S: the 1st spatial tile
+    i_1, j_1, # S: the 2nd spatial tile
+    k_0,      # R: the 1st reduction tile
+    i_2, j_2, # S: the 3rd spatial tile
+    k_1,      # R: the 2nd reduction tile
+    i_3, j_3, # S: the 4th spatial tile
+)
+```
+
+### 3.4. AutoScheduler-style Design Space Generation
+
+AutoScheduler (Ansor) generates schedule templates by applying their SearchRules to each stage. SearchRule analyzes TE and eagerly trigger schedule primitives accordingly in its internally maintained mini IR.
+
+As introduced in Section 3.2, composite schedule rules are equivalent to AutoScheduler's SearchRule in TensorIR scheduling. To further generate a design space for scheduling, sampling instructions are used in composite schedule rules. Similarly, the sketch generation phase in AutoScheduler is equivalent to applying composite schedule rules to each block in TensorIR.
+
+Several built-in composite schedule rules are shipped with our system to align with Ansor's design space:
+
+* Multi-level tiling
+* Inline pure spatial blocks
+* Parallelize & vectorize & unroll
+* Auto tensorize
+* ...
+
+Developers may implement their own rules in either Python or C++, and specify which rules to use with the following syntax:
+
+```python
+from tvm import meta_schedule as ms
+
+design_space_generator = ms.PostOrderApply(rules=[
+    ms.MultiLevelTiling(...),
+    CustomRule(...),
+    ms.OtherBuiltinRules(...),
+])
+
+```
+
+### 3.5. Unifying manual schedule / AutoTVM / Ansor
+
+In this section, we show that the design space induced by TE manual schedule, AutoTVM and Ansor are all subsets of meta schedule, and meta schedule further allows mixing those three styles to search jointly.
+
+**Manual schedule**. The TE schedule is a special case of a meta schedule program, where there is no randomness introduced by sampling instructions. It is a single point in terms of design space.
+
+**AutoTVM (Template-based tuning)**. It is more natural representation of AutoTVM’s schedule templates (knobs) by writing one or more schedule functions in meta schedule with sampling instructions. The execution traces generated by the schedule functions are the design space to be explored.
+
+**AutoScheduler (Ansor, Template-free tuning)**. As mentioned in the previous section, application of composite schedule rules generates the design space, which is equivalent to Ansor’s sketch generation.
+
+**Mixing styles in design space definition**. By taking union of the spaces induced by the three special cases, our system allows developers to combine generic rules that Ansor provides and operator-specific scheduling.
+
+## 4. Reference-level explanation
+
+In this section, we introduce the underlying techniques for the automation system to extract and explore the design space. The figure below briefly illustrates the workflow of the system:
+
+![meta-schedule-workflow](../resources/meta-schedule-workflow.png)
+
+### 4.1. Execution trace as the design space
+
+**Trace**. To represent the design space defined by the meta schedule DSL, the underlying system records all the instructions users applied to the schedule class, including sampling and schedule primitives. We call this list of instructions a trace.
+
+Executing the example above results in the following trace:
+
+```
+Instruction 0. Sample-Perfect-Tile
+Instruction 1. Sample-Perfect-Tile
+Instruction 2. Sample-Perfect-Tile
+Instruction 3. Split
+Instruction 4. Split
+Instruction 5. Split
+Instruction 6. Reorder
+```
+
+**Trace forms design space.** A trace may contain zero or more sampling instructions, which introduces the uncertainty in scheduling - one instance of sampling results in one point in the design space. Therefore, the trace itself forms a design space to explore, e.g. which set of tile sizes works best on a specific hardware.
+
+**Union of design space**. Our system works on a set of traces, representing the union of the design spaces represented by every single trace.
+
+**Fork a trace**. When two different decisions in the scheduling process are equally important to generate high-performance schedules, we allow forking the trace into two, and the design space is the union of the forked traces.
+
+The trace is not strictly user-facing, but can be accessed and printed with the following syntax:
+
+```python
+# requires to trace the execution
+sch = tir.Schedule(..., traced=True)
+# do a lot of scheduling
+...
+# print the trace
+print(sch.trace)
+```
+
+And below is an example of the printed trace, which honestly reflects the schedule as a snippet of python scheduling function:
+
+```python
+b0 = sch.get_block(name="matmul", func_name="main")
+l1, l2, l3 = sch.get_loops(block=b0)
+v4, v5, v6, v7 = sch.sample_perfect_tile(loop=l1, n=4, max_innermost_factor=16, decision=[32, 1, 16, 2])
+v8, v9, v10, v11 = sch.sample_perfect_tile(loop=l2, n=4, max_innermost_factor=16, decision=[64, 4, 2, 2])
+v12, v13 = sch.sample_perfect_tile(loop=l3, n=2, max_innermost_factor=16, decision=[64, 16])
+l14, l15, l16, l17 = sch.split(loop=l1, factors=[v4, v5, v6, v7])
+l18, l19, l20, l21 = sch.split(loop=l2, factors=[v8, v9, v10, v11])
+l22, l23 = sch.split(loop=l3, factors=[v12, v13])
+sch.reorder(l14, l18, l15, l19, l22, l16, l20, l23, l17, l21)
+```
+
+### 4.2. Exploring the Search Space
+
+Meta Schedule provides several built-in exploration strategies to exhaustively or efficiently search for efficient schedules.
+
+**Program replay**. A simple strategy that replays the user-provided schedule function without using or taking any advantage of trace.

Review comment:
       what about this:
   
   ```markdown
   **Random search by replaying schedule functions.** With a user-provided schedule function
   as a black-box design space generator, our system could repetitively invoke such an opaque function
   without doing any extra analysis. The function could be written in C++ or Python, or any language
   that implements packed function FFI. If sampling instructions are present in the function, then each
   invocation results in a different IRModule after being scheduled. Effectively, it is equivalent to
   random search without trace, allowing the flexibility for users to define arbitrary functions that
   trace may not well support (e.g. control flow divergence based on the value of intermediate random
   variables), but it forbids future opportunity of any trace-based analysis.
   
   **Random search by replaying traces.** From a design space generator, our system obtains the
   traces as the search space, and then those traces are replayed repetitively with a builtin interpreter.
   If sampling instructions are present on the traces, each replay explores in a random point in the
   design space of schedules.
   ```
   
   CC: @comaniac 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tvm-rfcs] junrushao1994 commented on a change in pull request #5: [RFC] Meta Schedule (AutoTIR)

Posted by GitBox <gi...@apache.org>.
junrushao1994 commented on a change in pull request #5:
URL: https://github.com/apache/tvm-rfcs/pull/5#discussion_r668482353



##########
File path: rfcs/0001-meta-schedule-autotensorir.md
##########
@@ -0,0 +1,444 @@
+<!--- Licensed to the Apache Software Foundation (ASF) under one -->
+<!--- or more contributor license agreements.  See the NOTICE file -->
+<!--- distributed with this work for additional information -->
+<!--- regarding copyright ownership.  The ASF licenses this file -->
+<!--- to you under the Apache License, Version 2.0 (the -->
+<!--- "License"); you may not use this file except in compliance -->
+<!--- with the License.  You may obtain a copy of the License at -->
+
+<!---   http://www.apache.org/licenses/LICENSE-2.0 -->
+
+<!--- Unless required by applicable law or agreed to in writing, -->
+<!--- software distributed under the License is distributed on an -->
+<!--- "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY -->
+<!--- KIND, either express or implied.  See the License for the -->
+<!--- specific language governing permissions and limitations -->
+<!--- under the License. -->
+
+* Feature Name: Meta Schedule (Formerly AutoTIR)
+* Start Date: 2021-05-28
+* RFC PR: TBD (apache/tvm-rfcs#0000)
+* GitHub Issue: TBD (apache/tvm-rfcs#0000)
+
+## 1. Summary
+
+This proposal introduces Meta Schedule: a probabilistic scheduling DSL on TIR that unifies the
+approaches of AutoTVM and Auto Scheduler (Ansor). Meta schedule provides a pragmatic way to define
+the space of automatic tuning, extensibility in terms of all possible TIR schedule primitives like
+tensorization and loop partitioning, and customizability on every layer of the automation system.
+
+Meta Schedule is our 3rd generation automatic scheduling system.
+
+## 2. Motivation
+
+**Scheduling and design space.** In TVM TensorIR, optimization of a TensorIR program is done via a
+sequence of transformations. For example, we reorder loops for better locality and we tensorize for
+specific hardware intrinsics. The process of invoking such a set of pre-defined transformations is
+called "**scheduling**", and each transformation is called a "**schedule primitive**". These
+primitives form a domain-specific language (DSL) describing the transformation of TensorIR programs.
+**Design space** is the set of all possible schedulings with respect to a TensorIR program.
+
+**Problems with the current scheduling system.** Currently we have 3 sets of scheduling APIs:
+* **Manual schedule**: Developers optimize their programs by manually invoking schedule primitives,
+  i.e. explore points in the design space with humans in the loop. This can be a tedious and
+  error-prone approach, hence the creation of AutoTVM and AutoScheduler (Ansor).
+* **AutoTVM**: The automation system requires users to define "schedule templates" as the design
+  space for each operator. Therefore, it is inextensible to hundreds of operators and variety
+  hardware platforms.
+* **AutoScheduler (Ansor)**: It automatically generates schedule templates as the design space,
+  according to a set of predefined "search rules". However, it is non-trivial to extend
+  AutoScheduler to new schedule primitives (tensorize, loop partition, software pipelining, etc).
+* The three systems above have isolated sets of APIs with several layers of their own abstraction,
+  which are not only hard to learn, but also engineering-intensive to customize.
+
+**Benefits of Meta Schedule.**  Meta schedule provides:
+* Succinct syntax, consistent APIs to TensorIR schedule with no other layer of abstraction.
+* Unified APIs for implementing manual schedules, AutoTVM-style schedules, and AutoScheduler-style
+  schedules.
+* Extensibility of all the schedule primitives, including tensorization and loop partitioning.
+  Almost no extra effort is needed to use a new primitive in auto-tuning.
+* The automation infrastructure is extensible on every of its components. Every component of the
+  system can be customized easily in pure python or C++ or both. For example, one can develop a new
+  design space generator in python, a new ProgramRunner in python, etc.
+
+
+## 3. Guide-level explanation
+
+In this section, we describe the syntax of meta schedule DSL, and how it could be used to describe
+and auto-generate the design space.
+
+### 3.1. Manual Schedule
+
+Meta schedule APIs are almost the same as TE or TensorIR scheduling. Here is an example of a manual
+schedule for matrix multiplication:
+
+```python
+# Designate a set of tile sizes
+i_tiles = [16, 8, 8, 8]
+j_tiles = [16, 8, 8, 8]
+k_tiles = [256, 8]
+
+# Tile the loops according to the tile sizes
+i_0, i_1, i_2, i_3 = sch.split(loop=i, factors=i_tiles)
+j_0, j_1, j_2, j_3 = sch.split(loop=j, factors=j_tiles)
+k_0, k_1           = sch.split(loop=k, factors=k_tiles)
+
+# Organize the loops into "SSRSRS" 6-level tiles
+sch.reorder(
+    i_0, j_0, # S: the 1st spatial tile
+    i_1, j_1, # S: the 2nd spatial tile
+    k_0,      # R: the 1st reduction tile
+    i_2, j_2, # S: the 3rd spatial tile
+    k_1,      # R: the 2nd reduction tile
+    i_3, j_3, # S: the 4th spatial tile
+)
+```
+
+In this example, the developers may tweak the tile sizes and measure the performance of the
+generated kernels to explore the opportunities of potential optimization.
+
+Generally speaking, while writing a schedule, there are often some parameters that are hard to
+determine ahead of time, for example, tile sizes, unroll steps, or which tensor intrinsics to use.
+Developers may manually enumerate possible combinations of these unknown factors, and then pick the
+best schedule according to measurement results on their device.
+
+### 3.2. Composite Schedule Rules
+
+As introduced in the previous section, in TensorIR, each schedule primitive handles only a very
+basic transformation of the IR. For example, `split` only splits a loop into two new loops. In the
+real world, the over-fine granularity of those primitives usually leads to repetitive and verbose
+scheduling code, as
+[mentioned](https://discuss.tvm.apache.org/t/rfc-tensorir-a-schedulable-ir-for-tvm/7872/43?u=junrushao1994)
+by developers in our community.
+
+To make it more convenient and modular, we allow users to register "composite schedules" that apply
+a sequence of schedule primitives according to certain analysis of the IR. For instance, a composite
+schedule may inspect a TensorIR block and decide whether we should call `compute_inline` on it.
+
+### 3.3. AutoTVM-style Design Space Description
+
+Meta schedule extends the schedule DSL with sampling instructions. When included in a schedule,
+these instructions parametrize the schedule from a single deterministic point to a space supported
+by random variables (tile size, etc.), making it possible for developers to describe the design
+space with meta schedule APIs.
+
+We can extend the matmul example above to cover all possible tilings using these sampling
+instructions:
+
+```python
+# Sample tile sizes
+i_tiles = sch.sample_perfect_tile(i, n=4)
+j_tiles = sch.sample_perfect_tile(j, n=4)
+k_tiles = sch.sample_perfect_tile(k, n=2)
+# Tile the loops according to the random variables
+i_0, i_1, i_2, i_3 = sch.split(loop=i, factors=i_tiles)
+j_0, j_1, j_2, j_3 = sch.split(loop=j, factors=j_tiles)
+k_0, k_1           = sch.split(loop=k, factors=k_tiles)
+# Organize the loops into "SSRSRS" 6-level tiles
+sch.reorder(
+    i_0, j_0, # S: the 1st spatial tile
+    i_1, j_1, # S: the 2nd spatial tile
+    k_0,      # R: the 1st reduction tile
+    i_2, j_2, # S: the 3rd spatial tile
+    k_1,      # R: the 2nd reduction tile
+    i_3, j_3, # S: the 4th spatial tile
+)
+```
+
+### 3.4. AutoScheduler-style Design Space Generation
+
+AutoScheduler (Ansor) generates schedule templates by applying their SearchRules to each stage.
+SearchRule analyzes TE and eagerly trigger schedule primitives accordingly in its internally
+maintained mini IR.
+
+As introduced in Section 3.2, composite schedule rules are equivalent to AutoScheduler's SearchRule
+in TensorIR scheduling. To further generate a design space for scheduling, sampling instructions are
+used in composite schedule rules. Similarly, the sketch generation phase in AutoScheduler is
+equivalent to applying composite schedule rules to each block in TensorIR.
+
+Several built-in composite schedule rules are shipped with our system to align with Ansor's design
+space:
+
+* Multi-level tiling
+* Inline pure spatial blocks
+* Parallelize & vectorize & unroll
+* Auto tensorize
+
+Developers may implement their own rules in either Python or C++. They may specify which rules to
+use with the following syntax:
+
+```python
+from tvm import meta_schedule as ms
+
+design_space_generator = ms.PostOrderApply(rules=[
+    ms.MultiLevelTiling(...),
+    CustomRule(...),
+    ms.OtherBuiltinRules(...),
+])
+
+```
+
+### 3.5. Unifying manual schedule / AutoTVM / Ansor
+
+In this section, we show that the design space induced by TE manual schedule, AutoTVM and Ansor are
+all subsets of meta schedule, and meta schedule further allows mixing those three styles to search
+jointly.
+
+**Manual schedule**. The TE schedule is a special case of a meta schedule program, where there is no
+randomness introduced by sampling instructions. It is a single point in terms of design space.
+
+**AutoTVM (Template-based tuning)**. It is more natural representation of AutoTVM’s schedule
+templates (knobs) by writing one or more schedule functions in meta schedule with sampling
+instructions. The execution traces generated by the schedule functions are the design space to be
+explored.
+
+**AutoScheduler (Ansor, Template-free tuning)**. As mentioned in the previous section, application
+of composite schedule rules generates the design space, which is equivalent to Ansor’s sketch
+generation.
+
+**Mixing styles in design space definition**. By taking union of the spaces induced by the three
+special cases, our system allows developers to combine generic rules that Ansor provides and
+operator-specific scheduling.
+
+## 4. Reference-level explanation
+
+In this section, we introduce the underlying techniques for the automation system to extract and
+explore the design space. The figure below briefly illustrates the workflow of the system:
+
+![meta-schedule-workflow](../resources/meta-schedule-workflow.png)
+
+### 4.1. Execution trace as the design space
+
+**Trace**. To represent the design space defined by the meta schedule DSL, the underlying system
+records all the instructions users applied to the schedule class, including sampling and schedule
+primitives. We call this list of instructions a trace.
+
+For instance, executing the example above results in the following trace:
+
+```
+Instruction 0. Sample-Perfect-Tile
+Instruction 1. Sample-Perfect-Tile
+Instruction 2. Sample-Perfect-Tile
+Instruction 3. Split
+Instruction 4. Split
+Instruction 5. Split
+Instruction 6. Reorder
+```
+
+**Trace forms design space.** A trace may contain zero or more sampling instructions, which
+introduces the uncertainty in scheduling - one instance of sampling results in one point in the
+design space. Therefore, the trace itself forms a design space to explore, e.g. which set of tile
+sizes works best on a specific hardware.
+
+**Union of design space**. Our system works on a set of traces, representing the union of the design
+spaces represented by every single trace.
+
+**Fork a trace**. When two different decisions in the scheduling process are equally important to
+generate high-performance schedules, we allow forking the trace into two, and the design space is
+the union of the forked traces.
+
+The trace is not strictly user-facing, but can be accessed and printed with the following syntax:
+
+```python
+# requires to trace the execution
+sch = tir.Schedule(..., traced=True)
+# do a lot of scheduling
+...
+# print the trace
+print(sch.trace)
+```
+
+And below is an example of the printed trace, which honestly reflects the schedule as a snippet of
+python scheduling function:
+
+```python
+b0 = sch.get_block(name="matmul", func_name="main")
+l1, l2, l3 = sch.get_loops(block=b0)
+v4, v5, v6, v7 = sch.sample_perfect_tile(loop=l1, n=4, max_innermost_factor=16, decision=[32, 1, 16, 2])
+v8, v9, v10, v11 = sch.sample_perfect_tile(loop=l2, n=4, max_innermost_factor=16, decision=[64, 4, 2, 2])
+v12, v13 = sch.sample_perfect_tile(loop=l3, n=2, max_innermost_factor=16, decision=[64, 16])
+l14, l15, l16, l17 = sch.split(loop=l1, factors=[v4, v5, v6, v7])
+l18, l19, l20, l21 = sch.split(loop=l2, factors=[v8, v9, v10, v11])
+l22, l23 = sch.split(loop=l3, factors=[v12, v13])
+sch.reorder(l14, l18, l15, l19, l22, l16, l20, l23, l17, l21)
+```
+
+### 4.2. Exploring the Design Space
+
+Meta Schedule provides several built-in exploration strategies to exhaustively or efficiently search
+for efficient schedules.
+
+**Random search by replaying schedule functions.** With a user-provided schedule function
+as a black-box design space generator, our system could repetitively invoke such an opaque function
+without doing any extra analysis. The function could be written in C++ or Python, or any language
+that implements packed function FFI. If sampling instructions are present in the function, then each
+invocation results in a different IRModule after being scheduled because the random decisions are
+possibly changed across different runs. Effectively, it is equivalent to
+random exploration without trace, allowing the flexibility for users to define arbitrary functions
+that trace may not well support (e.g. control flow divergence based on the value of intermediate
+random variables), but it forbids future opportunity of any trace-based analysis.
+
+**Random search by replaying traces.** Traces are obtained from a design space generator, and
+replayed with a builtin interpreter in our system. If sampling instructions are present on the
+traces, then their random decisions are mutated during each replay, i.e. jumps to a new point in the
+design space. Therefore, repetitive replay of those traces are equivalent to exploration of the
+design space. Our system could potentially benefit from trace-based analysis, including rejecting
+obviously invalid schedules (e.g. using too much CUDA resources), doing dead-code elimination to
+simplify a trace, extracting trace-based features used in the cost model, etc.
+
+**Cost-model-guided evolutionary search**. A more efficient exploration strategy. We define two sets

Review comment:
       Yeah this strategy is well described in Ansor's paper so I didn't put too many words here. I am going to add a citation here




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tvm-rfcs] junrushao1994 commented on a change in pull request #5: [RFC] Meta Schedule (AutoTIR)

Posted by GitBox <gi...@apache.org>.
junrushao1994 commented on a change in pull request #5:
URL: https://github.com/apache/tvm-rfcs/pull/5#discussion_r661752729



##########
File path: rfcs/0001-meta-schedule-autotensorir.md
##########
@@ -0,0 +1,297 @@
+* Feature Name: Meta Schedule (AutoTensorIR)
+* Start Date: 2021-05-28
+* RFC PR: TBD (apache/tvm-rfcs#0000)
+* GitHub Issue: TBD (apache/tvm-rfcs#0000)
+
+## 1. Summary
+
+This proposal introduces Meta Schedule: a probabilistic scheduling DSL on TIR that unifies the approaches of AutoTVM and Auto Scheduler (Ansor). Meta schedule provides a pragmatic way to define the space of automatic tuning, extensibility in terms of all possible TIR schedule primitives like tensorization and loop partitioning, and customizability on every layer of the automation system.
+
+Meta Schedule is our 3rd generation automatic scheduling system.
+
+## 2. Motivation
+
+**Scheduling and Design Space**
+
+In TVM TensorIR, optimization of a TensorIR program is done via a sequence of transformations. For example, we reorder loops for better locality and we tensorize for specific hardware intrinsics. The process of invoking such a set of pre-defined transformations is called “**scheduling**”, and each transformation is called a “**schedule primitive**”. These primitives form a domain-specific language (DSL) describing the transformation of TensorIR programs. **Design space** is the set of all possible schedulings with respect to a TensorIR program.
+
+**Problems with the Current Scheduling System**
+
+* **Manual schedule**: Developers optimize their programs by manually invoking schedule primitives, i.e. explore points in the design space with humans in the loop. This can be a tedious and error-prone approach, hence the creation of AutoTVM and AutoScheduler (Ansor).
+* **AutoTVM**: The automation system requires users to define “schedule templates” as the design space for each operator. Therefore, it is inextensible to hundreds of operators.
+* **AutoScheduler (Ansor)**: It automatically generates schedule templates as the design space, according to a set of predefined “search rules”. However, it is non-trivial to extend AutoScheduler to new schedule primitives (tensorize, loop partition, software pipelining).
+* The three systems above have isolated sets of APIs with several layers of their own abstraction, which are not only hard to learn, but also engineering-intensive to customize.
+
+**Benefit of Meta Schedule**
+
+* Succinct syntax, consistent APIs to TensorIR schedule with no other layer of abstraction.
+* Provides unified APIs for implementing manual schedule, AutoTVM and AutoScheduler (Ansor).
+* Extensibility to all the schedule primitives, including tensorization and loop partitioning. Almost no extra effort is needed to use a new primitive in auto-tuning.
+* The automation infrastructure is customizable across every layer.
+
+## 3. Guide-level explanation
+
+In this section, we describe the syntax of meta schedule DSL, and how it could be used to describe and auto-generate the design space.
+
+### 3.1. Manual Schedule
+
+Meta schedule APIs are almost the same as TE or TensorIR scheduling. Here is an example of a manual schedule for matrix multiplication:
+
+```python
+# Designate a set of tile sizes
+i_tiles = [16, 8, 8, 8]
+j_tiles = [16, 8, 8, 8]
+k_tiles = [256, 8]
+
+# Tile the loops according to the tile sizes
+i_0, i_1, i_2, i_3 = sch.split(loop=i, factors=i_tiles)
+j_0, j_1, j_2, j_3 = sch.split(loop=j, factors=j_tiles)
+k_0, k_1           = sch.split(loop=k, factors=k_tiles)
+
+# Organize the loops into “SSRSRS” 6-level tiles

Review comment:
       Sounds good. I added the explanation below:
   
   ```python
   # Organize the loops into "SSRSRS" 6-level tiles
   sch.reorder(
       i_0, j_0, # S: the 1st spatial tile
       i_1, j_1, # S: the 2nd spatial tile
       k_0,      # R: the 1st reduction tile
       i_2, j_2, # S: the 3rd spatial tile
       k_1,      # R: the 2nd reduction tile
       i_3, j_3, # S: the 4th spatial tile
   )
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tvm-rfcs] junrushao1994 commented on a change in pull request #5: [RFC] Meta Schedule (AutoTIR)

Posted by GitBox <gi...@apache.org>.
junrushao1994 commented on a change in pull request #5:
URL: https://github.com/apache/tvm-rfcs/pull/5#discussion_r676879435



##########
File path: rfcs/0001-meta-schedule-autotensorir.md
##########
@@ -266,55 +322,83 @@ sch.reorder(l14, l18, l15, l19, l22, l16, l20, l23, l17, l21)
 ### 4.2. Exploring the Design Space
 
 Meta Schedule provides several built-in exploration strategies to exhaustively or efficiently search
-for efficient schedules.
-
-**Random search by replaying schedule functions.** With a user-provided schedule function
-as a black-box design space generator, our system could repetitively invoke such an opaque function
-without doing any extra analysis. The function could be written in C++ or Python, or any language
-that implements packed function FFI. If sampling instructions are present in the function, then each
-invocation results in a different IRModule after being scheduled because the random decisions are
-possibly changed across different runs. Effectively, it is equivalent to
-random exploration without trace, allowing the flexibility for users to define arbitrary functions
+for efficient schedules. Those strategies are mostly supported by re-execute either a function or a
+trace with a builtin interpreter in meta schedule, and this process is called **replay**.
+
+#### Random search by replaying schedule functions
+
+With a user-provided schedule function
+as a black-box design space generator, meta schedule could repetitively invokes such an opaque TVM
+packed function without doing any extra analysis.
+If sampling instructions are present in the trace, then scheduling is non-deterministic
+(random decisions may not be repeated across runs)

Review comment:
       Yeah I should clarify more straightforward: a trace with decisions strictly guarantees reproducibility.
   
   In detail, a trace is defined as:
   
   ```python
   class Trace:
     instructions: List[Instruction]
     decisions: Dict[Instruction, Any]
   ```
   
   For each sampling instruction in the trace, if it has a corresponding entry in the `decisions` dict, then the output is uniquely determined by the decision, hence reproducible (example 1); If a corresponding entry is not presented, then randomness will be introduced by interpreting the trace (example 2).
   
   
   ```python
   # Example 1.  Trace with deterministic result
   sch.sample_perfect_tile(loop, n=4, decisions=[4, 4, 8, 32])
   # Example 2. Trace with randomized result
   sch.sample_perfect_tile(loop, n=4)
   ```
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tvm-rfcs] junrushao1994 commented on pull request #5: [RFC] Meta Schedule (AutoTIR)

Posted by GitBox <gi...@apache.org>.
junrushao1994 commented on pull request #5:
URL: https://github.com/apache/tvm-rfcs/pull/5#issuecomment-875121662


   @tkonolige thanks for the review!
   
   > The biggest one I can think of is the fact that we will have to rewrite all our existing schedules to take advantage of the new infrastructure.
   
   Given in most of the time we use sketch generation (in Ansor's terminology) to generate schedules automatically, we can just remove most of the schedules written in TE. Alternatively, we do need to rewrite all of the Ansor's sketch rules, including (defined in `src/auto_scheduler/search_policy/sketch_policy_rules.h`):
   - Always-Inline
   - Multi-Level-Tiling
   - Multi-Level-Tiling-with-Fusion
   - Add-Cache-Read
   - Add-Cache-Write
   - Add-RFactor
   - Simplify-Compute-with-Const-Tensor
   - Cross-Thread-Reduction
   - Special-Compute-Location
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tvm-rfcs] junrushao1994 commented on a change in pull request #5: [RFC] Meta Schedule (AutoTIR)

Posted by GitBox <gi...@apache.org>.
junrushao1994 commented on a change in pull request #5:
URL: https://github.com/apache/tvm-rfcs/pull/5#discussion_r664905351



##########
File path: rfcs/0001-meta-schedule-autotensorir.md
##########
@@ -0,0 +1,339 @@
+<!--- Licensed to the Apache Software Foundation (ASF) under one -->
+<!--- or more contributor license agreements.  See the NOTICE file -->
+<!--- distributed with this work for additional information -->
+<!--- regarding copyright ownership.  The ASF licenses this file -->
+<!--- to you under the Apache License, Version 2.0 (the -->
+<!--- "License"); you may not use this file except in compliance -->
+<!--- with the License.  You may obtain a copy of the License at -->
+
+<!---   http://www.apache.org/licenses/LICENSE-2.0 -->
+
+<!--- Unless required by applicable law or agreed to in writing, -->
+<!--- software distributed under the License is distributed on an -->
+<!--- "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY -->
+<!--- KIND, either express or implied.  See the License for the -->
+<!--- specific language governing permissions and limitations -->
+<!--- under the License. -->
+
+* Feature Name: Meta Schedule (AutoTIR)
+* Start Date: 2021-05-28
+* RFC PR: TBD (apache/tvm-rfcs#0000)
+* GitHub Issue: TBD (apache/tvm-rfcs#0000)
+
+## 1. Summary
+
+This proposal introduces Meta Schedule: a probabilistic scheduling DSL on TIR that unifies the approaches of AutoTVM and Auto Scheduler (Ansor). Meta schedule provides a pragmatic way to define the space of automatic tuning, extensibility in terms of all possible TIR schedule primitives like tensorization and loop partitioning, and customizability on every layer of the automation system.
+
+Meta Schedule is our 3rd generation automatic scheduling system.
+
+## 2. Motivation
+
+**Scheduling and Design Space**
+
+In TVM TensorIR, optimization of a TensorIR program is done via a sequence of transformations. For example, we reorder loops for better locality and we tensorize for specific hardware intrinsics. The process of invoking such a set of pre-defined transformations is called "**scheduling**", and each transformation is called a "**schedule primitive**". These primitives form a domain-specific language (DSL) describing the transformation of TensorIR programs. **Design space** is the set of all possible schedulings with respect to a TensorIR program.
+
+**Problems with the Current Scheduling System**
+
+* **Manual schedule**: Developers optimize their programs by manually invoking schedule primitives, i.e. explore points in the design space with humans in the loop. This can be a tedious and error-prone approach, hence the creation of AutoTVM and AutoScheduler (Ansor).
+* **AutoTVM**: The automation system requires users to define "schedule templates" as the design space for each operator. Therefore, it is inextensible to hundreds of operators and variety hardware platforms.
+* **AutoScheduler (Ansor)**: It automatically generates schedule templates as the design space, according to a set of predefined "search rules". However, it is non-trivial to extend AutoScheduler to new schedule primitives (tensorize, loop partition, software pipelining, etc).
+* The three systems above have isolated sets of APIs with several layers of their own abstraction, which are not only hard to learn, but also engineering-intensive to customize.
+
+**Benefit of Meta Schedule**
+
+* Succinct syntax, consistent APIs to TensorIR schedule with no other layer of abstraction.
+* Provides unified APIs for implementing manual schedule, AutoTVM and AutoScheduler (Ansor).
+* Extensibility to all the schedule primitives, including tensorization and loop partitioning. Almost no extra effort is needed to use a new primitive in auto-tuning.
+* The automation infrastructure is extensible on every of its components. Every component of the system can be customized easily in pure python or C++ or both. For example, one can develop a new design space generator in python, a new ProgramRunner in python, etc.
+
+
+## 3. Guide-level explanation
+
+In this section, we describe the syntax of meta schedule DSL, and how it could be used to describe and auto-generate the design space.
+
+### 3.1. Manual Schedule
+
+Meta schedule APIs are almost the same as TE or TensorIR scheduling. Here is an example of a manual schedule for matrix multiplication:
+
+```python
+# Designate a set of tile sizes
+i_tiles = [16, 8, 8, 8]
+j_tiles = [16, 8, 8, 8]
+k_tiles = [256, 8]
+
+# Tile the loops according to the tile sizes
+i_0, i_1, i_2, i_3 = sch.split(loop=i, factors=i_tiles)
+j_0, j_1, j_2, j_3 = sch.split(loop=j, factors=j_tiles)
+k_0, k_1           = sch.split(loop=k, factors=k_tiles)
+
+# Organize the loops into "SSRSRS" 6-level tiles
+sch.reorder(
+    i_0, j_0, # S: the 1st spatial tile
+    i_1, j_1, # S: the 2nd spatial tile
+    k_0,      # R: the 1st reduction tile
+    i_2, j_2, # S: the 3rd spatial tile
+    k_1,      # R: the 2nd reduction tile
+    i_3, j_3, # S: the 4th spatial tile
+)
+```
+
+In this example, the developers may tweak the tile sizes and measure the performance of the generated kernels to explore the opportunities of potential optimization.
+
+Generally speaking, while writing a schedule, there are often some parameters that are hard to determine ahead of time, for example, tile sizes, unroll steps, or which tensor intrinsics to use. Developers may manually enumerate possible combinations of these unknown factors, and then pick the best schedule according to measurement results on their device.
+
+### 3.2. Composite Schedule Rules
+
+As introduced in the previous section, in TensorIR, each schedule primitive handles only a very basic transformation of the IR, for example, `split` only splits a loop into two. In the real world, the over-fine granularity of those primitives usually leads to repetitive and verbose scheduling code, as [mentioned](https://discuss.tvm.apache.org/t/rfc-tensorir-a-schedulable-ir-for-tvm/7872/43?u=junrushao1994) by developers in our community.
+
+To make it more convenient and modular, we allow users to register "composite schedules" that apply a sequence of schedule primitives according to certain analysis of the IR. For instance, a composite schedule may inspect a TensorIR block and decide whether we should call `compute_inline` on it.
+
+### 3.3. AutoTVM-style Design Space Description
+
+Meta schedule extends the schedule DSL with sampling instructions. When included in a schedule, these instructions parametrize the schedule from a single deterministic point to a space supported by random variables (tile size, etc.), making it possible for developers to describe the design space with meta schedule APIs.
+
+We can extend the matmul example above to cover all possible tilings using these sampling instructions:
+
+```python
+# Sample tile sizes
+i_tiles = sch.sample_perfect_tile(i, n=4)
+j_tiles = sch.sample_perfect_tile(j, n=4)
+k_tiles = sch.sample_perfect_tile(k, n=2)
+# Tile the loops according to the random variables
+i_0, i_1, i_2, i_3 = sch.split(loop=i, factors=i_tiles)
+j_0, j_1, j_2, j_3 = sch.split(loop=j, factors=j_tiles)
+k_0, k_1           = sch.split(loop=k, factors=k_tiles)
+# Organize the loops into "SSRSRS" 6-level tiles
+sch.reorder(
+    i_0, j_0, # S: the 1st spatial tile
+    i_1, j_1, # S: the 2nd spatial tile
+    k_0,      # R: the 1st reduction tile
+    i_2, j_2, # S: the 3rd spatial tile
+    k_1,      # R: the 2nd reduction tile
+    i_3, j_3, # S: the 4th spatial tile
+)
+```
+
+### 3.4. AutoScheduler-style Design Space Generation
+
+AutoScheduler (Ansor) generates schedule templates by applying their SearchRules to each stage. SearchRule analyzes TE and eagerly trigger schedule primitives accordingly in its internally maintained mini IR.
+
+As introduced in Section 3.2, composite schedule rules are equivalent to AutoScheduler's SearchRule in TensorIR scheduling. To further generate a design space for scheduling, sampling instructions are used in composite schedule rules. Similarly, the sketch generation phase in AutoScheduler is equivalent to applying composite schedule rules to each block in TensorIR.
+
+Several built-in composite schedule rules are shipped with our system to align with Ansor's design space:
+
+* Multi-level tiling
+* Inline pure spatial blocks
+* Parallelize & vectorize & unroll
+* Auto tensorize
+* ...
+
+Developers may implement their own rules in either Python or C++, and specify which rules to use with the following syntax:
+
+```python
+from tvm import meta_schedule as ms
+
+design_space_generator = ms.PostOrderApply(rules=[
+    ms.MultiLevelTiling(...),
+    CustomRule(...),
+    ms.OtherBuiltinRules(...),
+])
+
+```
+
+### 3.5. Unifying manual schedule / AutoTVM / Ansor
+
+In this section, we show that the design space induced by TE manual schedule, AutoTVM and Ansor are all subsets of meta schedule, and meta schedule further allows mixing those three styles to search jointly.
+
+**Manual schedule**. The TE schedule is a special case of a meta schedule program, where there is no randomness introduced by sampling instructions. It is a single point in terms of design space.
+
+**AutoTVM (Template-based tuning)**. It is more natural representation of AutoTVM’s schedule templates (knobs) by writing one or more schedule functions in meta schedule with sampling instructions. The execution traces generated by the schedule functions are the design space to be explored.
+
+**AutoScheduler (Ansor, Template-free tuning)**. As mentioned in the previous section, application of composite schedule rules generates the design space, which is equivalent to Ansor’s sketch generation.
+
+**Mixing styles in design space definition**. By taking union of the spaces induced by the three special cases, our system allows developers to combine generic rules that Ansor provides and operator-specific scheduling.
+
+## 4. Reference-level explanation
+
+In this section, we introduce the underlying techniques for the automation system to extract and explore the design space. The figure below briefly illustrates the workflow of the system:
+
+![meta-schedule-workflow](../resources/meta-schedule-workflow.png)
+
+### 4.1. Execution trace as the design space
+
+**Trace**. To represent the design space defined by the meta schedule DSL, the underlying system records all the instructions users applied to the schedule class, including sampling and schedule primitives. We call this list of instructions a trace.
+
+Executing the example above results in the following trace:
+
+```
+Instruction 0. Sample-Perfect-Tile
+Instruction 1. Sample-Perfect-Tile
+Instruction 2. Sample-Perfect-Tile
+Instruction 3. Split
+Instruction 4. Split
+Instruction 5. Split
+Instruction 6. Reorder
+```
+
+**Trace forms design space.** A trace may contain zero or more sampling instructions, which introduces the uncertainty in scheduling - one instance of sampling results in one point in the design space. Therefore, the trace itself forms a design space to explore, e.g. which set of tile sizes works best on a specific hardware.
+
+**Union of design space**. Our system works on a set of traces, representing the union of the design spaces represented by every single trace.
+
+**Fork a trace**. When two different decisions in the scheduling process are equally important to generate high-performance schedules, we allow forking the trace into two, and the design space is the union of the forked traces.
+
+The trace is not strictly user-facing, but can be accessed and printed with the following syntax:
+
+```python
+# requires to trace the execution
+sch = tir.Schedule(..., traced=True)
+# do a lot of scheduling
+...
+# print the trace
+print(sch.trace)
+```
+
+And below is an example of the printed trace, which honestly reflects the schedule as a snippet of python scheduling function:
+
+```python
+b0 = sch.get_block(name="matmul", func_name="main")
+l1, l2, l3 = sch.get_loops(block=b0)
+v4, v5, v6, v7 = sch.sample_perfect_tile(loop=l1, n=4, max_innermost_factor=16, decision=[32, 1, 16, 2])
+v8, v9, v10, v11 = sch.sample_perfect_tile(loop=l2, n=4, max_innermost_factor=16, decision=[64, 4, 2, 2])
+v12, v13 = sch.sample_perfect_tile(loop=l3, n=2, max_innermost_factor=16, decision=[64, 16])
+l14, l15, l16, l17 = sch.split(loop=l1, factors=[v4, v5, v6, v7])
+l18, l19, l20, l21 = sch.split(loop=l2, factors=[v8, v9, v10, v11])
+l22, l23 = sch.split(loop=l3, factors=[v12, v13])
+sch.reorder(l14, l18, l15, l19, l22, l16, l20, l23, l17, l21)
+```
+
+### 4.2. Exploring the Search Space
+
+Meta Schedule provides several built-in exploration strategies to exhaustively or efficiently search for efficient schedules.
+
+**Program replay**. A simple strategy that replays the user-provided schedule function without using or taking any advantage of trace.
+
+**Random search**. Extracts the traces as the design space from any design space generator (e.g. user-provided schedule function, composite schedule rules applied to each block, or any custom space generator), repetitively mutates the random decisions of a random trace and re-executes the traces.
+
+**Cost-model-guided evolutionary search**. A more efficient exploration strategy. We define two sets of rules:
+
+* Mutator: defines how to jump to a point’s "neighbor" in the design space
+* Postprocessor: after the trace is executed, there are some extra rules we want to execute, for example:
+  * Check CUDA resource limits: There is a hard requirement in CUDA that the maximum number of threads should not exceed 1024, but it is a random variable that cannot be determined before actually executing the trace. In this case, we write a postprocessor that errors out when the condition is not satisfied.
+  * Fuse outer loops until the extent of the fused loops is large enough: The number of outer loops to be fused together depends on their extents, which are random variables. In this case, we annotate the maximum extent allowed on the block, and do actual fusion in a postprocessor.
+
+Our evolutionary search algorithm uses mutators to find possible schedules in the design space, then applies postprocessors, and asks the cost model to predict its performance. After several iterations, the new schedules with the highest scores are finally compiled and measured on device. Epsilon-greedy is used in this process to balance exploitation and exploration.
+
+### 4.3. Python first for flexibility & customizability
+
+We implement the system in a way that all levels are decoupled and open to customization, aiming at providing a playground for developers to try out new ideas and potentially deliver performance quickly.
+
+While all the important APIs are implemented in C++ for efficiency, every part of the system can be easily switched to customized python implementation. For example,
+
+**Customize design space in python**. Can be a python function that does the schedule
+
+```python
+def schedule_matmul(sch) -> sch:
+    i, j, k = sch.get_loops(sch.get_block("matmul"))
+    i_tiles = sch.sample_perfect_tile(i, n=4)
+    j_tiles = sch.sample_perfect_tile(j, n=4)
+    k_tiles = sch.sample_perfect_tile(k, n=2)
+    # Tile the loops according to the random variables
+    i_0, i_1, i_2, i_3 = sch.split(loop=i, factors=i_tiles)
+    j_0, j_1, j_2, j_3 = sch.split(loop=j, factors=j_tiles)
+    k_0, k_1 = sch.split(loop=k, factors=k_tiles)
+    # Organize the loops into "SSRSRS" 6-level tiles
+    sch.reorder(
+        i_0, j_0, # S
+        i_1, j_1, # S
+        k_0,      # R
+        i_2, j_2, # S
+        k_1,      # R
+        i_3, j_3, # S
+    )
+    return sch
+```
+
+**Customize composite schedule in python**. We provide two ways to define a composite schedule in python:
+
+Method 1. Derive from `PyCompositeSchedule`, and implement two methods `initialize` and `apply`:
+
+```python
+class MultiLevelTiling(PyCompositeSchedule):
+    def initialize(...):
+        # initialize the class, usually this method is empty
+        ...
+
+    def apply(sch: Schedule, block: BlockRV) -> Union[Schedule, List[Schedule]]:
+        # write any python code, including:
+        # - analyze `block`
+        # - invoke schedule primitives in `sch`
+        # - do debug printing
+        ...
+```
+
+Method 2. A decorator as the syntactic sugar if the `initialize` method is empty, which converts the function to the `apply` method.
+
+```python
+@tir.as_composite_schedule(name="multi-level-tiling")
+def multi_level_tiling(sch: Schedule, block: BlockRV) -> Union[Schedule, List[Schedule]]:
+    # write any python code, including:
+    # - analyze `block`
+    # - invoke schedule primitives in `sch`
+    # - do debug printing
+    ...
+```
+
+**Customize exploration strategies in python**. Developers can implement any search algorithm in python as well by deriving from `PySearchPolicy`, and the syntax is identical to customizing with `PyCompositeSchedule`.
+
+**Other customizable components**. This list includes:
+
+* Cost model
+* Database
+* Measure callbacks
+* Feature extractor
+* Program builder & runner
+* Analysis methods
+* ...
+
+In a short summary, almost every component of the system is decoupled with each other and extensions could be easily plugged in.
+
+## 5. Drawbacks
+
+We are not aware of any drawbacks of the proposed system.

Review comment:
       CC: @comaniac 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tvm-rfcs] junrushao1994 commented on a change in pull request #5: [RFC] Meta Schedule (AutoTIR)

Posted by GitBox <gi...@apache.org>.
junrushao1994 commented on a change in pull request #5:
URL: https://github.com/apache/tvm-rfcs/pull/5#discussion_r676997469



##########
File path: rfcs/0001-meta-schedule-autotensorir.md
##########
@@ -22,36 +22,48 @@
 
 ## 1. Summary
 
-This proposal introduces Meta Schedule: a probabilistic scheduling DSL on TIR that unifies the
+This proposal introduces Meta Schedule: a scheduling DSL on TIR that unifies the
 approaches of AutoTVM and Auto Scheduler (Ansor). Meta schedule provides a pragmatic way to define
 the space of automatic tuning, extensibility in terms of all possible TIR schedule primitives like
 tensorization and loop partitioning, and customizability on every layer of the automation system.
 
-Meta Schedule is our 3rd generation automatic scheduling system.
+Meta Schedule is the 3rd generation automatic scheduling system.
 
 ## 2. Motivation
 
 **Scheduling and design space.** In TVM TensorIR, optimization of a TensorIR program is done via a
-sequence of transformations. For example, we reorder loops for better locality and we tensorize for
+sequence of transformations. For example, reordering loops for better locality and tensorizing for
 specific hardware intrinsics. The process of invoking such a set of pre-defined transformations is
 called "**scheduling**", and each transformation is called a "**schedule primitive**". These

Review comment:
       That makes sense. I will introduce scheduling as a general concept that both TE and TensorIR uses to transform the IR into potentially more optimized form. There are subtle difference here, because TE relies on a schedule tree which doesn't generate TIR until being lowered (i.e. `tvm.lower` is called), while TensorIR scheduling directly transforms the IR without the indirect schedule tree.
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tvm-rfcs] junrushao1994 commented on a change in pull request #5: [RFC] Meta Schedule (AutoTIR)

Posted by GitBox <gi...@apache.org>.
junrushao1994 commented on a change in pull request #5:
URL: https://github.com/apache/tvm-rfcs/pull/5#discussion_r668282356



##########
File path: rfcs/0001-meta-schedule-autotensorir.md
##########
@@ -0,0 +1,444 @@
+<!--- Licensed to the Apache Software Foundation (ASF) under one -->
+<!--- or more contributor license agreements.  See the NOTICE file -->
+<!--- distributed with this work for additional information -->
+<!--- regarding copyright ownership.  The ASF licenses this file -->
+<!--- to you under the Apache License, Version 2.0 (the -->
+<!--- "License"); you may not use this file except in compliance -->
+<!--- with the License.  You may obtain a copy of the License at -->
+
+<!---   http://www.apache.org/licenses/LICENSE-2.0 -->
+
+<!--- Unless required by applicable law or agreed to in writing, -->
+<!--- software distributed under the License is distributed on an -->
+<!--- "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY -->
+<!--- KIND, either express or implied.  See the License for the -->
+<!--- specific language governing permissions and limitations -->
+<!--- under the License. -->
+
+* Feature Name: Meta Schedule (Formerly AutoTIR)
+* Start Date: 2021-05-28
+* RFC PR: TBD (apache/tvm-rfcs#0000)
+* GitHub Issue: TBD (apache/tvm-rfcs#0000)
+
+## 1. Summary
+
+This proposal introduces Meta Schedule: a probabilistic scheduling DSL on TIR that unifies the
+approaches of AutoTVM and Auto Scheduler (Ansor). Meta schedule provides a pragmatic way to define
+the space of automatic tuning, extensibility in terms of all possible TIR schedule primitives like
+tensorization and loop partitioning, and customizability on every layer of the automation system.
+
+Meta Schedule is our 3rd generation automatic scheduling system.
+
+## 2. Motivation
+
+**Scheduling and design space.** In TVM TensorIR, optimization of a TensorIR program is done via a
+sequence of transformations. For example, we reorder loops for better locality and we tensorize for
+specific hardware intrinsics. The process of invoking such a set of pre-defined transformations is
+called "**scheduling**", and each transformation is called a "**schedule primitive**". These
+primitives form a domain-specific language (DSL) describing the transformation of TensorIR programs.
+**Design space** is the set of all possible schedulings with respect to a TensorIR program.
+
+**Problems with the current scheduling system.** Currently we have 3 sets of scheduling APIs:
+* **Manual schedule**: Developers optimize their programs by manually invoking schedule primitives,
+  i.e. explore points in the design space with humans in the loop. This can be a tedious and
+  error-prone approach, hence the creation of AutoTVM and AutoScheduler (Ansor).
+* **AutoTVM**: The automation system requires users to define "schedule templates" as the design
+  space for each operator. Therefore, it is inextensible to hundreds of operators and variety
+  hardware platforms.
+* **AutoScheduler (Ansor)**: It automatically generates schedule templates as the design space,
+  according to a set of predefined "search rules". However, it is non-trivial to extend
+  AutoScheduler to new schedule primitives (tensorize, loop partition, software pipelining, etc).
+* The three systems above have isolated sets of APIs with several layers of their own abstraction,
+  which are not only hard to learn, but also engineering-intensive to customize.
+
+**Benefits of Meta Schedule.**  Meta schedule provides:
+* Succinct syntax, consistent APIs to TensorIR schedule with no other layer of abstraction.
+* Unified APIs for implementing manual schedules, AutoTVM-style schedules, and AutoScheduler-style
+  schedules.
+* Extensibility of all the schedule primitives, including tensorization and loop partitioning.
+  Almost no extra effort is needed to use a new primitive in auto-tuning.
+* The automation infrastructure is extensible on every of its components. Every component of the
+  system can be customized easily in pure python or C++ or both. For example, one can develop a new
+  design space generator in python, a new ProgramRunner in python, etc.
+
+
+## 3. Guide-level explanation
+
+In this section, we describe the syntax of meta schedule DSL, and how it could be used to describe
+and auto-generate the design space.
+
+### 3.1. Manual Schedule
+
+Meta schedule APIs are almost the same as TE or TensorIR scheduling. Here is an example of a manual
+schedule for matrix multiplication:
+
+```python
+# Designate a set of tile sizes
+i_tiles = [16, 8, 8, 8]
+j_tiles = [16, 8, 8, 8]
+k_tiles = [256, 8]
+
+# Tile the loops according to the tile sizes
+i_0, i_1, i_2, i_3 = sch.split(loop=i, factors=i_tiles)
+j_0, j_1, j_2, j_3 = sch.split(loop=j, factors=j_tiles)
+k_0, k_1           = sch.split(loop=k, factors=k_tiles)
+
+# Organize the loops into "SSRSRS" 6-level tiles
+sch.reorder(
+    i_0, j_0, # S: the 1st spatial tile
+    i_1, j_1, # S: the 2nd spatial tile
+    k_0,      # R: the 1st reduction tile
+    i_2, j_2, # S: the 3rd spatial tile
+    k_1,      # R: the 2nd reduction tile
+    i_3, j_3, # S: the 4th spatial tile
+)
+```
+
+In this example, the developers may tweak the tile sizes and measure the performance of the
+generated kernels to explore the opportunities of potential optimization.
+
+Generally speaking, while writing a schedule, there are often some parameters that are hard to
+determine ahead of time, for example, tile sizes, unroll steps, or which tensor intrinsics to use.
+Developers may manually enumerate possible combinations of these unknown factors, and then pick the
+best schedule according to measurement results on their device.
+
+### 3.2. Composite Schedule Rules
+
+As introduced in the previous section, in TensorIR, each schedule primitive handles only a very
+basic transformation of the IR. For example, `split` only splits a loop into two new loops. In the

Review comment:
       What about this: 
   
   > In the real world, the over-fine granularity of those primitives usually leads to repetitive and verbose scheduling code. Take the code snippet in the previous section as an example, a sequence of `split`s are invoked, followed by a `reorder`, and all these together are called "SSRSRS" tiling.
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tvm-rfcs] junrushao1994 commented on a change in pull request #5: [RFC] Meta Schedule (AutoTIR)

Posted by GitBox <gi...@apache.org>.
junrushao1994 commented on a change in pull request #5:
URL: https://github.com/apache/tvm-rfcs/pull/5#discussion_r668322103



##########
File path: rfcs/0001-meta-schedule-autotensorir.md
##########
@@ -0,0 +1,444 @@
+<!--- Licensed to the Apache Software Foundation (ASF) under one -->
+<!--- or more contributor license agreements.  See the NOTICE file -->
+<!--- distributed with this work for additional information -->
+<!--- regarding copyright ownership.  The ASF licenses this file -->
+<!--- to you under the Apache License, Version 2.0 (the -->
+<!--- "License"); you may not use this file except in compliance -->
+<!--- with the License.  You may obtain a copy of the License at -->
+
+<!---   http://www.apache.org/licenses/LICENSE-2.0 -->
+
+<!--- Unless required by applicable law or agreed to in writing, -->
+<!--- software distributed under the License is distributed on an -->
+<!--- "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY -->
+<!--- KIND, either express or implied.  See the License for the -->
+<!--- specific language governing permissions and limitations -->
+<!--- under the License. -->
+
+* Feature Name: Meta Schedule (Formerly AutoTIR)
+* Start Date: 2021-05-28
+* RFC PR: TBD (apache/tvm-rfcs#0000)
+* GitHub Issue: TBD (apache/tvm-rfcs#0000)
+
+## 1. Summary
+
+This proposal introduces Meta Schedule: a probabilistic scheduling DSL on TIR that unifies the
+approaches of AutoTVM and Auto Scheduler (Ansor). Meta schedule provides a pragmatic way to define
+the space of automatic tuning, extensibility in terms of all possible TIR schedule primitives like
+tensorization and loop partitioning, and customizability on every layer of the automation system.
+
+Meta Schedule is our 3rd generation automatic scheduling system.
+
+## 2. Motivation
+
+**Scheduling and design space.** In TVM TensorIR, optimization of a TensorIR program is done via a
+sequence of transformations. For example, we reorder loops for better locality and we tensorize for
+specific hardware intrinsics. The process of invoking such a set of pre-defined transformations is
+called "**scheduling**", and each transformation is called a "**schedule primitive**". These
+primitives form a domain-specific language (DSL) describing the transformation of TensorIR programs.
+**Design space** is the set of all possible schedulings with respect to a TensorIR program.
+
+**Problems with the current scheduling system.** Currently we have 3 sets of scheduling APIs:
+* **Manual schedule**: Developers optimize their programs by manually invoking schedule primitives,
+  i.e. explore points in the design space with humans in the loop. This can be a tedious and
+  error-prone approach, hence the creation of AutoTVM and AutoScheduler (Ansor).
+* **AutoTVM**: The automation system requires users to define "schedule templates" as the design
+  space for each operator. Therefore, it is inextensible to hundreds of operators and variety
+  hardware platforms.
+* **AutoScheduler (Ansor)**: It automatically generates schedule templates as the design space,
+  according to a set of predefined "search rules". However, it is non-trivial to extend
+  AutoScheduler to new schedule primitives (tensorize, loop partition, software pipelining, etc).
+* The three systems above have isolated sets of APIs with several layers of their own abstraction,
+  which are not only hard to learn, but also engineering-intensive to customize.
+
+**Benefits of Meta Schedule.**  Meta schedule provides:
+* Succinct syntax, consistent APIs to TensorIR schedule with no other layer of abstraction.
+* Unified APIs for implementing manual schedules, AutoTVM-style schedules, and AutoScheduler-style
+  schedules.
+* Extensibility of all the schedule primitives, including tensorization and loop partitioning.
+  Almost no extra effort is needed to use a new primitive in auto-tuning.
+* The automation infrastructure is extensible on every of its components. Every component of the
+  system can be customized easily in pure python or C++ or both. For example, one can develop a new
+  design space generator in python, a new ProgramRunner in python, etc.
+
+
+## 3. Guide-level explanation
+
+In this section, we describe the syntax of meta schedule DSL, and how it could be used to describe
+and auto-generate the design space.
+
+### 3.1. Manual Schedule
+
+Meta schedule APIs are almost the same as TE or TensorIR scheduling. Here is an example of a manual
+schedule for matrix multiplication:
+
+```python
+# Designate a set of tile sizes
+i_tiles = [16, 8, 8, 8]
+j_tiles = [16, 8, 8, 8]
+k_tiles = [256, 8]
+
+# Tile the loops according to the tile sizes
+i_0, i_1, i_2, i_3 = sch.split(loop=i, factors=i_tiles)
+j_0, j_1, j_2, j_3 = sch.split(loop=j, factors=j_tiles)
+k_0, k_1           = sch.split(loop=k, factors=k_tiles)
+
+# Organize the loops into "SSRSRS" 6-level tiles
+sch.reorder(
+    i_0, j_0, # S: the 1st spatial tile
+    i_1, j_1, # S: the 2nd spatial tile
+    k_0,      # R: the 1st reduction tile
+    i_2, j_2, # S: the 3rd spatial tile
+    k_1,      # R: the 2nd reduction tile
+    i_3, j_3, # S: the 4th spatial tile
+)
+```
+
+In this example, the developers may tweak the tile sizes and measure the performance of the
+generated kernels to explore the opportunities of potential optimization.
+
+Generally speaking, while writing a schedule, there are often some parameters that are hard to
+determine ahead of time, for example, tile sizes, unroll steps, or which tensor intrinsics to use.
+Developers may manually enumerate possible combinations of these unknown factors, and then pick the
+best schedule according to measurement results on their device.
+
+### 3.2. Composite Schedule Rules
+
+As introduced in the previous section, in TensorIR, each schedule primitive handles only a very
+basic transformation of the IR. For example, `split` only splits a loop into two new loops. In the
+real world, the over-fine granularity of those primitives usually leads to repetitive and verbose
+scheduling code, as
+[mentioned](https://discuss.tvm.apache.org/t/rfc-tensorir-a-schedulable-ir-for-tvm/7872/43?u=junrushao1994)
+by developers in our community.
+
+To make it more convenient and modular, we allow users to register "composite schedules" that apply
+a sequence of schedule primitives according to certain analysis of the IR. For instance, a composite
+schedule may inspect a TensorIR block and decide whether we should call `compute_inline` on it.
+
+### 3.3. AutoTVM-style Design Space Description
+
+Meta schedule extends the schedule DSL with sampling instructions. When included in a schedule,
+these instructions parametrize the schedule from a single deterministic point to a space supported
+by random variables (tile size, etc.), making it possible for developers to describe the design
+space with meta schedule APIs.
+
+We can extend the matmul example above to cover all possible tilings using these sampling
+instructions:
+
+```python
+# Sample tile sizes
+i_tiles = sch.sample_perfect_tile(i, n=4)
+j_tiles = sch.sample_perfect_tile(j, n=4)
+k_tiles = sch.sample_perfect_tile(k, n=2)
+# Tile the loops according to the random variables
+i_0, i_1, i_2, i_3 = sch.split(loop=i, factors=i_tiles)
+j_0, j_1, j_2, j_3 = sch.split(loop=j, factors=j_tiles)
+k_0, k_1           = sch.split(loop=k, factors=k_tiles)
+# Organize the loops into "SSRSRS" 6-level tiles
+sch.reorder(
+    i_0, j_0, # S: the 1st spatial tile
+    i_1, j_1, # S: the 2nd spatial tile
+    k_0,      # R: the 1st reduction tile
+    i_2, j_2, # S: the 3rd spatial tile
+    k_1,      # R: the 2nd reduction tile
+    i_3, j_3, # S: the 4th spatial tile
+)
+```
+
+### 3.4. AutoScheduler-style Design Space Generation
+
+AutoScheduler (Ansor) generates schedule templates by applying their SearchRules to each stage.
+SearchRule analyzes TE and eagerly trigger schedule primitives accordingly in its internally

Review comment:
       Yeah, to clarity, definitely it is not what I intend to say, and "eagerly trigger" is not a precise term here.
   
   The background context is:
   - TE scheduling doesn't have an "eager mode" - all scheduling instructions are manipulating something called "Schedule Tree", and any transformation to TIR doesn't take in place until `tvm.lower` is called.
   - To get around this (because AutoScheduler really needs to analyze the IR on the fly), AutoScheduler has to develop its own mini DSL to manipulate its own mini IR
   - TensorIR doesn't have such problem, because each of its scheduling instruction is just an eager and direct transformation of the IR.
   
   
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tvm-rfcs] tqchen commented on a change in pull request #5: [RFC] Meta Schedule (AutoTIR)

Posted by GitBox <gi...@apache.org>.
tqchen commented on a change in pull request #5:
URL: https://github.com/apache/tvm-rfcs/pull/5#discussion_r648515238



##########
File path: rfcs/0001-meta-schedule-autotensorir.md
##########
@@ -0,0 +1,297 @@
+* Feature Name: Meta Schedule (AutoTensorIR)
+* Start Date: 2021-05-28
+* RFC PR: TBD (apache/tvm-rfcs#0000)
+* GitHub Issue: TBD (apache/tvm-rfcs#0000)
+
+## 1. Summary
+
+This proposal introduces Meta Schedule: a probabilistic scheduling DSL on TIR that unifies the approaches of AutoTVM and Auto Scheduler (Ansor). Meta schedule provides a pragmatic way to define the space of automatic tuning, extensibility in terms of all possible TIR schedule primitives like tensorization and loop partitioning, and customizability on every layer of the automation system.
+
+Meta Schedule is our 3rd generation automatic scheduling system.
+
+## 2. Motivation
+
+**Scheduling and Design Space**
+
+In TVM TensorIR, optimization of a TensorIR program is done via a sequence of transformations. For example, we reorder loops for better locality and we tensorize for specific hardware intrinsics. The process of invoking such a set of pre-defined transformations is called “**scheduling**”, and each transformation is called a “**schedule primitive**”. These primitives form a domain-specific language (DSL) describing the transformation of TensorIR programs. **Design space** is the set of all possible schedulings with respect to a TensorIR program.
+
+**Problems with the Current Scheduling System**
+
+* **Manual schedule**: Developers optimize their programs by manually invoking schedule primitives, i.e. explore points in the design space with humans in the loop. This can be a tedious and error-prone approach, hence the creation of AutoTVM and AutoScheduler (Ansor).
+* **AutoTVM**: The automation system requires users to define “schedule templates” as the design space for each operator. Therefore, it is inextensible to hundreds of operators.
+* **AutoScheduler (Ansor)**: It automatically generates schedule templates as the design space, according to a set of predefined “search rules”. However, it is non-trivial to extend AutoScheduler to new schedule primitives (tensorize, loop partition, software pipelining).
+* The three systems above have isolated sets of APIs with several layers of their own abstraction, which are not only hard to learn, but also engineering-intensive to customize.
+
+**Benefit of Meta Schedule**
+
+* Succinct syntax, consistent APIs to TensorIR schedule with no other layer of abstraction.
+* Provides unified APIs for implementing manual schedule, AutoTVM and AutoScheduler (Ansor).
+* Extensibility to all the schedule primitives, including tensorization and loop partitioning. Almost no extra effort is needed to use a new primitive in auto-tuning.
+* The automation infrastructure is customizable across every layer.
+
+## 3. Guide-level explanation
+
+In this section, we describe the syntax of meta schedule DSL, and how it could be used to describe and auto-generate the design space.
+
+### 3.1. Manual Schedule
+
+Meta schedule APIs are almost the same as TE or TensorIR scheduling. Here is an example of a manual schedule for matrix multiplication:
+
+```python
+# Designate a set of tile sizes
+i_tiles = [16, 8, 8, 8]
+j_tiles = [16, 8, 8, 8]
+k_tiles = [256, 8]
+
+# Tile the loops according to the tile sizes
+i_0, i_1, i_2, i_3 = sch.split(loop=i, factors=i_tiles)
+j_0, j_1, j_2, j_3 = sch.split(loop=j, factors=j_tiles)
+k_0, k_1           = sch.split(loop=k, factors=k_tiles)
+
+# Organize the loops into “SSRSRS” 6-level tiles
+sch.reorder(
+    i_0, j_0, # S
+    i_1, j_1, # S
+    k_0,      # R
+    i_2, j_2, # S
+    k_1,      # R
+    i_3, j_3, # S
+)
+```
+
+In this example, the developers may tweak the tile sizes and measure the performance of the generated kernels to explore the opportunities of potential optimization.
+
+Generally speaking, while writing a schedule, there are often some parameters that are hard to determine ahead of time, for example, tile sizes, unroll steps, or which tensor intrinsics to use. Developers may manually enumerate possible combinations of these unknown factors, and then pick the best schedule according to measurement results on their device.
+
+### 3.2. AutoTVM-style Design Space Description
+
+Meta schedule extends the schedule DSL with sampling instructions. When included in a schedule, these instructions parametrize the schedule from a single deterministic point to a space supported by random variables (tile size, etc.), making it possible for developers to describe the design space with meta schedule APIs.
+
+We can extend the matmul example above to cover all possible tilings using these sampling instructions:
+
+```python
+# Sample tile sizes
+i_tiles = sch.sample_perfect_tile(i, n=4)
+j_tiles = sch.sample_perfect_tile(j, n=4)
+k_tiles = sch.sample_perfect_tile(k, n=2)
+# Tile the loops according to the random variables
+i_0, i_1, i_2, i_3 = sch.split(loop=i, factors=i_tiles)
+j_0, j_1, j_2, j_3 = sch.split(loop=j, factors=j_tiles)
+k_0, k_1           = sch.split(loop=k, factors=k_tiles)
+# Organize the loops into “SSRSRS” 6-level tiles
+sch.reorder(
+    i_0, j_0, # S
+    i_1, j_1, # S
+    k_0,      # R
+    i_2, j_2, # S
+    k_1,      # R
+    i_3, j_3, # S
+)
+```
+
+### 3.3. Composite Schedule
+
+Each schedule primitive handles only a very basic operation to transform the IR, for example, `split` only splits a loop into two. In the real world, the over-fine granularity of those primitives usually leads to repetitive and verbose scheduling code, as [mentioned](https://discuss.tvm.apache.org/t/rfc-tensorir-a-schedulable-ir-for-tvm/7872/43?u=junrushao1994) by developers in our community.
+
+To counter this challenge, we allow users to register “composite schedules” that analyze the IR, and apply a set of schedule primitives correspondingly. For instance, a composite schedule may inspect a TensorIR block and decide whether we should call `compute_inline` on it. The composite schedule may use sampling instructions to fill in undecided choices.
+
+Our system also ships with some built-in composite schedules, including:
+
+* Multi-level tiling
+* Inline pure spatial blocks
+* Parallelize & vectorize & unroll
+* Auto tensorize
+* …
+
+### 3.4. AutoScheduler-style Design Space Generation
+
+AutoScheduler (Ansor) generates schedule templates by applying their SearchRules to each stage. Meta schedule treats a search rule as a composite schedule, and applies each composite schedule to each block of TensorIR to generate the design space.
+
+### 3.5. Unifying manual schedule / AutoTVM / Ansor
+
+In this section, we show that the design space induced by TE manual schedule, AutoTVM and Ansor are all subsets of meta schedule, and meta schedule further allows mixing those three styles to search jointly.
+
+**Manual schedule**. The TE schedule is a special case of a meta schedule program, where there is no randomness introduced by sampling instructions. It is a single point in terms of design space.
+
+**AutoTVM (Template-based tuning)**. Writing one or more schedule functions in meta schedule, potentially with sampling instructions, is a natural representation of AutoTVM’s schedule templates (knobs). The PPL generates one or more traces as the design space to explore.
+
+**AutoScheduler (Ansor, Template-free tuning)**. As mentioned in the previous section, application of composite schedule rules generates the design space, which is equivalent to Ansor’s sketch generation.
+
+**Mixing styles in design space definition**. By taking union of the spaces induced by the three special cases, our system allows developers to combine generic rules that Ansor provides and operator-specific scheduling.
+
+## 4. Reference-level explanation
+
+In this section, we introduce the underlying techniques for the automation system to extract and explore the design space.
+
+### 4.1. Execution trace as the design space
+
+**Trace**. To represent the design space defined by the meta schedule DSL, the underlying system records all the instructions users applied to the schedule class, including sampling and schedule primitives. We call this list of instructions a trace.
+
+Executing the example above results in the following trace:
+
+```
+Instruction 0. Sample-Perfect-Tile
+Instruction 1. Sample-Perfect-Tile
+Instruction 2. Sample-Perfect-Tile
+Instruction 3. Split
+Instruction 4. Split
+Instruction 5. Split
+Instruction 6. Reorder
+```
+
+The trace is not directly user-facing, but a data structure inside the user-facing `Schedule` class that records the execution. The automation system extracts the trace and finds out the design space according to the sampling instructions.
+
+**Union of traces**. Often a single trace is unable to represent the entire space. Therefore, more precisely, our system works on a list of traces as the union of potential design space.
+
+**Fork a trace**. When two different decisions in the scheduling process are equally important to generate high-performance schedules, we allow forking the trace into two, and the design space is the union of the forked traces.
+
+### 4.2. Exploring the Search Space
+
+Meta Schedule provides several built-in exploration strategies to exhaustively or efficiently search for efficient schedules.
+
+**Program replay**. A simple strategy that replays the schedule program that generates the PPL, and doesn’t use any advantage provided by the PPL.
+
+**Random search**. Extracts the PPL, and repetitively re-executes the PPL by flipping coins purely randomly.
+
+**Cost-model-guided evolutionary search**. A more efficient exploration strategy. We define two sets of rules:
+
+* Mutator: defines how to jump to a point’s “neighbor” in the design space
+* Postprocessor: sometimes it is non-trivial to statically determine the PPL, for example:
+  * There is a hard requirement in CUDA that the maximum number of threads should not exceed 1024, but it is a random variable that cannot be determined before actually executing the PPL. In this case, we write a postprocessor that errors out when the condition is not satisfied.
+  * The number of outer loops to be fused together depends on their extents, which are random variables. In this case, we annotate the maximum extent allowed on the block, and do actual fusion in a postprocessor.
+
+Our evolutionary search algorithm uses mutators to find possible schedules in the design space, then applies postprocessors, and asks the cost model to predict its performance. After several iterations, the new schedules with the highest scores are finally compiled and measured on device. Epsilon-greedy is used in this process to balance exploitation and exploration.
+
+### 4.3. Python first for flexibility & customizability
+
+We implement the system in a way that all levels are decoupled and open to customization, aiming at providing a playground for developers to try out new ideas and potentially deliver performance quickly.
+
+While all the important APIs are implemented in C++ for efficiency, every part of the system can be easily switched to customized python implementation. For example,
+
+**Customize design space in python**. Can be a python function that does the schedule
+
+```python
+def schedule_matmul(sch) -> sch:
+    i, j, k = sch.get_loops(sch.get_block(“matmul”))
+    i_tiles = sch.sample_perfect_tile(i, n=4)
+    j_tiles = sch.sample_perfect_tile(j, n=4)
+    k_tiles = sch.sample_perfect_tile(k, n=2)
+    # Tile the loops according to the random variables
+    i_0, i_1, i_2, i_3 = sch.split(loop=i, factors=i_tiles)
+    j_0, j_1, j_2, j_3 = sch.split(loop=j, factors=j_tiles)
+    k_0, k_1 = sch.split(loop=k, factors=k_tiles)
+    # Organize the loops into “SSRSRS” 6-level tiles
+    sch.reorder(
+        i_0, j_0, # S
+        i_1, j_1, # S
+        k_0,      # R
+        i_2, j_2, # S
+        k_1,      # R
+        i_3, j_3, # S
+    )
+    return sch
+```
+
+**Customize composite schedule in python**. We provide two ways to define a composite schedule in python:
+
+Method 1. A simple decorator that converts a python function to a composite schedule
+
+```python
+@tir.as_composite_schedule(name="multi-level-tiling")
+def multi_level_tiling(sch: Schedule, block: BlockRV) -> Union[Schedule, List[Schedule]]:
+    ...
+```
+
+Method 2. Derive from `PyCompositeSchedule`, providing extra functionalities like initialization
+
+```python
+class MultiLevelTiling(PyCompositeSchedule):
+    def initialize(...):
+        ...
+    def apply(...):
+        ...
+```
+
+**Customize exploration strategies in python**. Developers can implement any search algorithm in python as well by deriving from `PySearchPolicy`.
+
+**Other customizable components**. This list includes:
+
+* Cost model
+* Database
+* Measure callbacks
+* Feature extractor
+* Program builder & runner
+* Analysis methods
+* ...
+
+In a short summary, almost every component of the system is decoupled with each other and extensions could be easily plugged in.
+
+### 4.4. Upstreaming Plan
+
+[M3a] Core infrastructure of the PPL
+* Instruction
+* Trace
+* Composite schedule
+* Sampler
+* Search policy
+* Design space generator
+
+[M3b] Host-side search infra
+* Database
+* Cost model
+* Measure callback
+
+[M3c] RPC-related search infra
+* Measure input, build result, measure result
+* Builder
+* Runner
+
+[M4a] Implementation of rules
+* Various built-in composite schedules
+* Various built-in mutators
+* Various built-in postprocessors
+* Automatic tensorization
+
+[M4b] Relay integration
+
+## 5. Drawbacks
+
+We are not aware of any drawbacks of the proposed system.
+
+## 6. Rationale and alternatives
+
+The system is designed with the principle of minimalism: different from alternative solutions, we do not require any change in existing codebase, or extra APIs to learn. It could potentially lower the bar of using automation systems. 
+
+Unifying manual scheduling, AutoTVM's semi automatic templates and AutoScheduler's (Ansor's) fully automatic sketch generation provides flexible way to balance injection new domain knowledge and automation.
+
+Flexibility in customization allows quick try-out on new tasks, new strategies and new hardware targets without deep knowledge of the system.
+
+## 7. Prior art
+
+**Tensor Expression (TE)** in TVM is a DSL that decouples compute and schedule, which provides convenient ways to handcraft optimized kernels for different hardware targets.
+
+**TensorIR** is the latest generation of TVM’s low-level IR. Its capability of eagerly applying schedule primitives opens the door for meta schedule, our proposed new-generation auto scheduling system.
+
+**AutoTVM** is the 1st generation automation framework in TVM, which requires developers to implement per-operator scheduling templates, and the system could handle the tuning process.
+
+**AutoScheduler (Ansor)** is the 2nd generation automation framework in TVM, whose built-in rules could automatically generate schedule templates for almost all the operators on CPU, GPU, etc.
+
+## 8. Unresolved questions
+
+**Supporting Control Flow and Assertions**
+
+Right now the meta schedule DSL does not support control flow. Although we didn’t see any real-world use case right now, it is possible that it could appear in some future workloads.

Review comment:
       I think this refers to the scheduling language that transforms the program, not the program itself.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tvm-rfcs] junrushao1994 commented on a change in pull request #5: [RFC] Meta Schedule (AutoTIR)

Posted by GitBox <gi...@apache.org>.
junrushao1994 commented on a change in pull request #5:
URL: https://github.com/apache/tvm-rfcs/pull/5#discussion_r668259724



##########
File path: rfcs/0001-meta-schedule-autotensorir.md
##########
@@ -0,0 +1,444 @@
+<!--- Licensed to the Apache Software Foundation (ASF) under one -->
+<!--- or more contributor license agreements.  See the NOTICE file -->
+<!--- distributed with this work for additional information -->
+<!--- regarding copyright ownership.  The ASF licenses this file -->
+<!--- to you under the Apache License, Version 2.0 (the -->
+<!--- "License"); you may not use this file except in compliance -->
+<!--- with the License.  You may obtain a copy of the License at -->
+
+<!---   http://www.apache.org/licenses/LICENSE-2.0 -->
+
+<!--- Unless required by applicable law or agreed to in writing, -->
+<!--- software distributed under the License is distributed on an -->
+<!--- "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY -->
+<!--- KIND, either express or implied.  See the License for the -->
+<!--- specific language governing permissions and limitations -->
+<!--- under the License. -->
+
+* Feature Name: Meta Schedule (Formerly AutoTIR)
+* Start Date: 2021-05-28
+* RFC PR: TBD (apache/tvm-rfcs#0000)
+* GitHub Issue: TBD (apache/tvm-rfcs#0000)
+
+## 1. Summary
+
+This proposal introduces Meta Schedule: a probabilistic scheduling DSL on TIR that unifies the
+approaches of AutoTVM and Auto Scheduler (Ansor). Meta schedule provides a pragmatic way to define
+the space of automatic tuning, extensibility in terms of all possible TIR schedule primitives like
+tensorization and loop partitioning, and customizability on every layer of the automation system.
+
+Meta Schedule is our 3rd generation automatic scheduling system.
+
+## 2. Motivation
+
+**Scheduling and design space.** In TVM TensorIR, optimization of a TensorIR program is done via a
+sequence of transformations. For example, we reorder loops for better locality and we tensorize for
+specific hardware intrinsics. The process of invoking such a set of pre-defined transformations is
+called "**scheduling**", and each transformation is called a "**schedule primitive**". These
+primitives form a domain-specific language (DSL) describing the transformation of TensorIR programs.
+**Design space** is the set of all possible schedulings with respect to a TensorIR program.
+
+**Problems with the current scheduling system.** Currently we have 3 sets of scheduling APIs:
+* **Manual schedule**: Developers optimize their programs by manually invoking schedule primitives,
+  i.e. explore points in the design space with humans in the loop. This can be a tedious and
+  error-prone approach, hence the creation of AutoTVM and AutoScheduler (Ansor).
+* **AutoTVM**: The automation system requires users to define "schedule templates" as the design
+  space for each operator. Therefore, it is inextensible to hundreds of operators and variety
+  hardware platforms.
+* **AutoScheduler (Ansor)**: It automatically generates schedule templates as the design space,
+  according to a set of predefined "search rules". However, it is non-trivial to extend
+  AutoScheduler to new schedule primitives (tensorize, loop partition, software pipelining, etc).
+* The three systems above have isolated sets of APIs with several layers of their own abstraction,
+  which are not only hard to learn, but also engineering-intensive to customize.
+
+**Benefits of Meta Schedule.**  Meta schedule provides:
+* Succinct syntax, consistent APIs to TensorIR schedule with no other layer of abstraction.
+* Unified APIs for implementing manual schedules, AutoTVM-style schedules, and AutoScheduler-style
+  schedules.
+* Extensibility of all the schedule primitives, including tensorization and loop partitioning.
+  Almost no extra effort is needed to use a new primitive in auto-tuning.
+* The automation infrastructure is extensible on every of its components. Every component of the
+  system can be customized easily in pure python or C++ or both. For example, one can develop a new
+  design space generator in python, a new ProgramRunner in python, etc.
+
+
+## 3. Guide-level explanation
+
+In this section, we describe the syntax of meta schedule DSL, and how it could be used to describe
+and auto-generate the design space.
+
+### 3.1. Manual Schedule
+
+Meta schedule APIs are almost the same as TE or TensorIR scheduling. Here is an example of a manual
+schedule for matrix multiplication:
+
+```python
+# Designate a set of tile sizes
+i_tiles = [16, 8, 8, 8]
+j_tiles = [16, 8, 8, 8]
+k_tiles = [256, 8]
+
+# Tile the loops according to the tile sizes
+i_0, i_1, i_2, i_3 = sch.split(loop=i, factors=i_tiles)
+j_0, j_1, j_2, j_3 = sch.split(loop=j, factors=j_tiles)
+k_0, k_1           = sch.split(loop=k, factors=k_tiles)
+
+# Organize the loops into "SSRSRS" 6-level tiles
+sch.reorder(
+    i_0, j_0, # S: the 1st spatial tile
+    i_1, j_1, # S: the 2nd spatial tile
+    k_0,      # R: the 1st reduction tile
+    i_2, j_2, # S: the 3rd spatial tile
+    k_1,      # R: the 2nd reduction tile
+    i_3, j_3, # S: the 4th spatial tile
+)
+```
+
+In this example, the developers may tweak the tile sizes and measure the performance of the

Review comment:
       That makes sense :-)
   
   Yeah I will remove the word "may" here, as it is the only way they could tweak the manual schedule




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tvm-rfcs] junrushao1994 commented on pull request #5: [RFC] Meta Schedule (AutoTIR)

Posted by GitBox <gi...@apache.org>.
junrushao1994 commented on pull request #5:
URL: https://github.com/apache/tvm-rfcs/pull/5#issuecomment-875137200


   @tkonolige @comaniac I updated with some explanation that we replay the trace repetitively to do random search, and some examples of trace-based analysis. Would you guys like to take another look? Thanks a lot!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tvm-rfcs] junrushao1994 commented on a change in pull request #5: [RFC] Meta Schedule (AutoTIR)

Posted by GitBox <gi...@apache.org>.
junrushao1994 commented on a change in pull request #5:
URL: https://github.com/apache/tvm-rfcs/pull/5#discussion_r676989948



##########
File path: rfcs/0001-meta-schedule-autotensorir.md
##########
@@ -107,29 +128,64 @@ best schedule according to measurement results on their device.
 As introduced in the previous section, in TensorIR, each schedule primitive handles only a very
 basic transformation of the IR. For example, `split` only splits a loop into two new loops. In the
 real world, the over-fine granularity of those primitives usually leads to repetitive and verbose
-scheduling code, as
-[mentioned](https://discuss.tvm.apache.org/t/rfc-tensorir-a-schedulable-ir-for-tvm/7872/43?u=junrushao1994)
-by developers in our community.
+scheduling code. Take the code snippet in the previous section as an example, a sequence of `split`s
+are invoked, followed by a `reorder`, and all these together are called "SSRSRS" tiling.
+
+To make it more convenient and modular, users are allowed to register **composite schedules** that apply
+a sequence of schedule primitives according to certain analysis of the IR. The word *composite* here
+is used against the word *primitive*, which means it is a transformation *composed* of those
+*primitives*.
+
+For example, suppose there is a composite schedule called `Inline-All-Elementwise-Operations`, which

Review comment:
       Yeah I added an explanation of `Inline-Elementwise-Operation`:
   
   ```python
   @tvm.script.tir
   def example_func(...):
     for i, j in ...:
       with tir.Block("B") ...:
         B[i, j] = A[i, j] + 1
     for i, j in ...:
       with tir.Block("C") ...:
         C[i, j] = B[i, j] + 1
     for i, j in ...:
       with tir.Block("D") ...:
         D[i, j] = C[i, j] + 1
   
   sch = tir.Schedule(example_func)
   # `InlineElementwiseOperation` is a composite schedule rule that analyzes a given block.
   # If the block contains only elementwise computation, and can be inlined into its consumer,
   # then `sch.compute_inline` is called on that block.
   inliner = InlineElementwiseOperation()
   inliner.apply(schedule=sch, block=sch.get_block("B"))
   inliner.apply(schedule=sch, block=sch.get_block("C"))
   inliner.apply(schedule=sch, block=sch.get_block("D"))
   ```
   
   Below is the result after applying this composite schedule and its corresponding trace:
   
   ```python
   >>> print(tvm.script.asscript(sch.mod))
   
   @tvm.script.tir
   def example_func(...):
     for i, j in ...:
       with tir.Block("D") ...:
         D[i, j] = A[i, j] + 1 + 1 + 1
   
   >>> print(sch.trace)
   
   # Block "B" is elementwise and inlinable, then `sch.compute_inline(B)` is called
   B = sch.get_block("B")
   sch.compute_inline(B)
   # Block "C" is elementwise and inlinable, then `sch.compute_inline(C)` is called
   C = sch.get_block("C")
   sch.compute_inline(C)
   # Block "D" is elementwise but does not have a consumer,
   # so the rule does not call `compute_inline` because it is not inlinable
   D = sch.get_block("D")
   
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tvm-rfcs] junrushao1994 commented on a change in pull request #5: [RFC] Meta Schedule (AutoTensorIR)

Posted by GitBox <gi...@apache.org>.
junrushao1994 commented on a change in pull request #5:
URL: https://github.com/apache/tvm-rfcs/pull/5#discussion_r647967159



##########
File path: rfcs/0001-meta-schedule-autotensorir.md
##########
@@ -0,0 +1,297 @@
+* Feature Name: Meta Schedule (AutoTensorIR)
+* Start Date: 2021-05-28
+* RFC PR: TBD (apache/tvm-rfcs#0000)
+* GitHub Issue: TBD (apache/tvm-rfcs#0000)
+
+## 1. Summary
+
+This proposal introduces Meta Schedule: a probabilistic scheduling DSL on TIR that unifies the approaches of AutoTVM and Auto Scheduler (Ansor). Meta schedule provides a pragmatic way to define the space of automatic tuning, extensibility in terms of all possible TIR schedule primitives like tensorization and loop partitioning, and customizability on every layer of the automation system.
+
+Meta Schedule is our 3rd generation automatic scheduling system.
+
+## 2. Motivation
+
+**Scheduling and Design Space**
+
+In TVM TensorIR, optimization of a TensorIR program is done via a sequence of transformations. For example, we reorder loops for better locality and we tensorize for specific hardware intrinsics. The process of invoking such a set of pre-defined transformations is called “**scheduling**”, and each transformation is called a “**schedule primitive**”. These primitives form a domain-specific language (DSL) describing the transformation of TensorIR programs. **Design space** is the set of all possible schedulings with respect to a TensorIR program.
+
+**Problems with the Current Scheduling System**
+
+* **Manual schedule**: Developers optimize their programs by manually invoking schedule primitives, i.e. explore points in the design space with humans in the loop. This can be a tedious and error-prone approach, hence the creation of AutoTVM and AutoScheduler (Ansor).
+* **AutoTVM**: The automation system requires users to define “schedule templates” as the design space for each operator. Therefore, it is inextensible to hundreds of operators.
+* **AutoScheduler (Ansor)**: It automatically generates schedule templates as the design space, according to a set of predefined “search rules”. However, it is non-trivial to extend AutoScheduler to new schedule primitives (tensorize, loop partition, software pipelining).
+* The three systems above have isolated sets of APIs with several layers of their own abstraction, which are not only hard to learn, but also engineering-intensive to customize.
+
+**Benefit of Meta Schedule**
+
+* Succinct syntax, consistent APIs to TensorIR schedule with no other layer of abstraction.
+* Provides unified APIs for implementing manual schedule, AutoTVM and AutoScheduler (Ansor).
+* Extensibility to all the schedule primitives, including tensorization and loop partitioning. Almost no extra effort is needed to use a new primitive in auto-tuning.
+* The automation infrastructure is customizable across every layer.
+
+## 3. Guide-level explanation
+
+In this section, we describe the syntax of meta schedule DSL, and how it could be used to describe and auto-generate the design space.
+
+### 3.1. Manual Schedule
+
+Meta schedule APIs are almost the same as TE or TensorIR scheduling. Here is an example of a manual schedule for matrix multiplication:
+
+```python
+# Designate a set of tile sizes
+i_tiles = [16, 8, 8, 8]
+j_tiles = [16, 8, 8, 8]
+k_tiles = [256, 8]
+
+# Tile the loops according to the tile sizes
+i_0, i_1, i_2, i_3 = sch.split(loop=i, factors=i_tiles)
+j_0, j_1, j_2, j_3 = sch.split(loop=j, factors=j_tiles)
+k_0, k_1           = sch.split(loop=k, factors=k_tiles)
+
+# Organize the loops into “SSRSRS” 6-level tiles
+sch.reorder(
+    i_0, j_0, # S
+    i_1, j_1, # S
+    k_0,      # R
+    i_2, j_2, # S
+    k_1,      # R
+    i_3, j_3, # S
+)
+```
+
+In this example, the developers may tweak the tile sizes and measure the performance of the generated kernels to explore the opportunities of potential optimization.
+
+Generally speaking, while writing a schedule, there are often some parameters that are hard to determine ahead of time, for example, tile sizes, unroll steps, or which tensor intrinsics to use. Developers may manually enumerate possible combinations of these unknown factors, and then pick the best schedule according to measurement results on their device.
+
+### 3.2. AutoTVM-style Design Space Description
+
+Meta schedule extends the schedule DSL with sampling instructions. When included in a schedule, these instructions parametrize the schedule from a single deterministic point to a space supported by random variables (tile size, etc.), making it possible for developers to describe the design space with meta schedule APIs.
+
+We can extend the matmul example above to cover all possible tilings using these sampling instructions:
+
+```python
+# Sample tile sizes
+i_tiles = sch.sample_perfect_tile(i, n=4)
+j_tiles = sch.sample_perfect_tile(j, n=4)
+k_tiles = sch.sample_perfect_tile(k, n=2)
+# Tile the loops according to the random variables
+i_0, i_1, i_2, i_3 = sch.split(loop=i, factors=i_tiles)
+j_0, j_1, j_2, j_3 = sch.split(loop=j, factors=j_tiles)
+k_0, k_1           = sch.split(loop=k, factors=k_tiles)
+# Organize the loops into “SSRSRS” 6-level tiles
+sch.reorder(
+    i_0, j_0, # S
+    i_1, j_1, # S
+    k_0,      # R
+    i_2, j_2, # S
+    k_1,      # R
+    i_3, j_3, # S
+)
+```
+
+### 3.3. Composite Schedule
+
+Each schedule primitive handles only a very basic operation to transform the IR, for example, `split` only splits a loop into two. In the real world, the over-fine granularity of those primitives usually leads to repetitive and verbose scheduling code, as [mentioned](https://discuss.tvm.apache.org/t/rfc-tensorir-a-schedulable-ir-for-tvm/7872/43?u=junrushao1994) by developers in our community.
+
+To counter this challenge, we allow users to register “composite schedules” that analyze the IR, and apply a set of schedule primitives correspondingly. For instance, a composite schedule may inspect a TensorIR block and decide whether we should call `compute_inline` on it. The composite schedule may use sampling instructions to fill in undecided choices.
+
+Our system also ships with some built-in composite schedules, including:
+
+* Multi-level tiling
+* Inline pure spatial blocks
+* Parallelize & vectorize & unroll
+* Auto tensorize
+* …
+
+### 3.4. AutoScheduler-style Design Space Generation
+
+AutoScheduler (Ansor) generates schedule templates by applying their SearchRules to each stage. Meta schedule treats a search rule as a composite schedule, and applies each composite schedule to each block of TensorIR to generate the design space.
+
+### 3.5. Unifying manual schedule / AutoTVM / Ansor
+
+In this section, we show that the design space induced by TE manual schedule, AutoTVM and Ansor are all subsets of meta schedule, and meta schedule further allows mixing those three styles to search jointly.
+
+**Manual schedule**. The TE schedule is a special case of a meta schedule program, where there is no randomness introduced by sampling instructions. It is a single point in terms of design space.
+
+**AutoTVM (Template-based tuning)**. Writing one or more schedule functions in meta schedule, potentially with sampling instructions, is a natural representation of AutoTVM’s schedule templates (knobs). The PPL generates one or more traces as the design space to explore.
+
+**AutoScheduler (Ansor, Template-free tuning)**. As mentioned in the previous section, application of composite schedule rules generates the design space, which is equivalent to Ansor’s sketch generation.

Review comment:
       It is definitely a good idea. Meta schedule enables this possibility by combining the design spaces generated by composite schedule rules and manual ones, but I didn't want to make a serious decision here right now on what the syntax should look like, and would love to defer the decision making to when upgrading TOPI and Relay operator strategy. CC: @tqchen 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tvm-rfcs] junrushao1994 commented on a change in pull request #5: [RFC] Meta Schedule (AutoTIR)

Posted by GitBox <gi...@apache.org>.
junrushao1994 commented on a change in pull request #5:
URL: https://github.com/apache/tvm-rfcs/pull/5#discussion_r676866041



##########
File path: rfcs/0001-meta-schedule-autotensorir.md
##########
@@ -147,17 +203,20 @@ sch.reorder(
 
 ### 3.4. AutoScheduler-style Design Space Generation
 
-AutoScheduler (Ansor) generates schedule templates by applying their SearchRules to each stage.
-SearchRule analyzes TE and eagerly trigger schedule primitives accordingly in its internally
-maintained mini IR.
+To generate design space, AutoScheduler (Ansor) applies a set of rules to each TE stage.
+The rules analyze the TE operations and apply an internal DSL to manipulating its internal IR,
+which is in the end mapped to TE schedule primitives. This process is called *sketch generation*.
 
-As introduced in Section 3.2, composite schedule rules are equivalent to AutoScheduler's SearchRule
-in TensorIR scheduling. To further generate a design space for scheduling, sampling instructions are
-used in composite schedule rules. Similarly, the sketch generation phase in AutoScheduler is
-equivalent to applying composite schedule rules to each block in TensorIR.
+Composite schedule rules work in a similar way scheduling TensorIR, as introduced in Section 3.2.
+It analyzes the TensorIR and apply schedule primitives directly to TensorIR accordingly.
+When applying such rules to each TensorIR block in certain order (e.g. Post-DFS Order),
+it generates a sequence of schedule primitives.
+If the sampling instructions are present in this sequence, 
+the support of the probability space form a design space of possible schedulings.

Review comment:
       Further exploring the space is not part of the sketch generation phase (1st phase in AutoScheduler), instead it corresponds to the random annotation phase (2nd phase in AutoScheduler). I rephrased it a bit so that it is more clarified.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tvm-rfcs] junrushao1994 commented on a change in pull request #5: [RFC] Meta Schedule (AutoTIR)

Posted by GitBox <gi...@apache.org>.
junrushao1994 commented on a change in pull request #5:
URL: https://github.com/apache/tvm-rfcs/pull/5#discussion_r668386464



##########
File path: rfcs/0001-meta-schedule-autotensorir.md
##########
@@ -0,0 +1,444 @@
+<!--- Licensed to the Apache Software Foundation (ASF) under one -->
+<!--- or more contributor license agreements.  See the NOTICE file -->
+<!--- distributed with this work for additional information -->
+<!--- regarding copyright ownership.  The ASF licenses this file -->
+<!--- to you under the Apache License, Version 2.0 (the -->
+<!--- "License"); you may not use this file except in compliance -->
+<!--- with the License.  You may obtain a copy of the License at -->
+
+<!---   http://www.apache.org/licenses/LICENSE-2.0 -->
+
+<!--- Unless required by applicable law or agreed to in writing, -->
+<!--- software distributed under the License is distributed on an -->
+<!--- "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY -->
+<!--- KIND, either express or implied.  See the License for the -->
+<!--- specific language governing permissions and limitations -->
+<!--- under the License. -->
+
+* Feature Name: Meta Schedule (Formerly AutoTIR)
+* Start Date: 2021-05-28
+* RFC PR: TBD (apache/tvm-rfcs#0000)
+* GitHub Issue: TBD (apache/tvm-rfcs#0000)
+
+## 1. Summary
+
+This proposal introduces Meta Schedule: a probabilistic scheduling DSL on TIR that unifies the
+approaches of AutoTVM and Auto Scheduler (Ansor). Meta schedule provides a pragmatic way to define
+the space of automatic tuning, extensibility in terms of all possible TIR schedule primitives like
+tensorization and loop partitioning, and customizability on every layer of the automation system.
+
+Meta Schedule is our 3rd generation automatic scheduling system.
+
+## 2. Motivation
+
+**Scheduling and design space.** In TVM TensorIR, optimization of a TensorIR program is done via a
+sequence of transformations. For example, we reorder loops for better locality and we tensorize for
+specific hardware intrinsics. The process of invoking such a set of pre-defined transformations is
+called "**scheduling**", and each transformation is called a "**schedule primitive**". These
+primitives form a domain-specific language (DSL) describing the transformation of TensorIR programs.
+**Design space** is the set of all possible schedulings with respect to a TensorIR program.
+
+**Problems with the current scheduling system.** Currently we have 3 sets of scheduling APIs:
+* **Manual schedule**: Developers optimize their programs by manually invoking schedule primitives,
+  i.e. explore points in the design space with humans in the loop. This can be a tedious and
+  error-prone approach, hence the creation of AutoTVM and AutoScheduler (Ansor).
+* **AutoTVM**: The automation system requires users to define "schedule templates" as the design
+  space for each operator. Therefore, it is inextensible to hundreds of operators and variety
+  hardware platforms.
+* **AutoScheduler (Ansor)**: It automatically generates schedule templates as the design space,
+  according to a set of predefined "search rules". However, it is non-trivial to extend
+  AutoScheduler to new schedule primitives (tensorize, loop partition, software pipelining, etc).
+* The three systems above have isolated sets of APIs with several layers of their own abstraction,
+  which are not only hard to learn, but also engineering-intensive to customize.
+
+**Benefits of Meta Schedule.**  Meta schedule provides:
+* Succinct syntax, consistent APIs to TensorIR schedule with no other layer of abstraction.
+* Unified APIs for implementing manual schedules, AutoTVM-style schedules, and AutoScheduler-style
+  schedules.
+* Extensibility of all the schedule primitives, including tensorization and loop partitioning.
+  Almost no extra effort is needed to use a new primitive in auto-tuning.
+* The automation infrastructure is extensible on every of its components. Every component of the
+  system can be customized easily in pure python or C++ or both. For example, one can develop a new
+  design space generator in python, a new ProgramRunner in python, etc.
+
+
+## 3. Guide-level explanation
+
+In this section, we describe the syntax of meta schedule DSL, and how it could be used to describe
+and auto-generate the design space.
+
+### 3.1. Manual Schedule
+
+Meta schedule APIs are almost the same as TE or TensorIR scheduling. Here is an example of a manual
+schedule for matrix multiplication:
+
+```python
+# Designate a set of tile sizes
+i_tiles = [16, 8, 8, 8]
+j_tiles = [16, 8, 8, 8]
+k_tiles = [256, 8]
+
+# Tile the loops according to the tile sizes
+i_0, i_1, i_2, i_3 = sch.split(loop=i, factors=i_tiles)
+j_0, j_1, j_2, j_3 = sch.split(loop=j, factors=j_tiles)
+k_0, k_1           = sch.split(loop=k, factors=k_tiles)
+
+# Organize the loops into "SSRSRS" 6-level tiles
+sch.reorder(
+    i_0, j_0, # S: the 1st spatial tile
+    i_1, j_1, # S: the 2nd spatial tile
+    k_0,      # R: the 1st reduction tile
+    i_2, j_2, # S: the 3rd spatial tile
+    k_1,      # R: the 2nd reduction tile
+    i_3, j_3, # S: the 4th spatial tile
+)
+```
+
+In this example, the developers may tweak the tile sizes and measure the performance of the
+generated kernels to explore the opportunities of potential optimization.
+
+Generally speaking, while writing a schedule, there are often some parameters that are hard to
+determine ahead of time, for example, tile sizes, unroll steps, or which tensor intrinsics to use.
+Developers may manually enumerate possible combinations of these unknown factors, and then pick the
+best schedule according to measurement results on their device.
+
+### 3.2. Composite Schedule Rules
+
+As introduced in the previous section, in TensorIR, each schedule primitive handles only a very
+basic transformation of the IR. For example, `split` only splits a loop into two new loops. In the
+real world, the over-fine granularity of those primitives usually leads to repetitive and verbose
+scheduling code, as
+[mentioned](https://discuss.tvm.apache.org/t/rfc-tensorir-a-schedulable-ir-for-tvm/7872/43?u=junrushao1994)
+by developers in our community.
+
+To make it more convenient and modular, we allow users to register "composite schedules" that apply
+a sequence of schedule primitives according to certain analysis of the IR. For instance, a composite
+schedule may inspect a TensorIR block and decide whether we should call `compute_inline` on it.
+
+### 3.3. AutoTVM-style Design Space Description
+
+Meta schedule extends the schedule DSL with sampling instructions. When included in a schedule,
+these instructions parametrize the schedule from a single deterministic point to a space supported
+by random variables (tile size, etc.), making it possible for developers to describe the design
+space with meta schedule APIs.
+
+We can extend the matmul example above to cover all possible tilings using these sampling
+instructions:
+
+```python
+# Sample tile sizes
+i_tiles = sch.sample_perfect_tile(i, n=4)
+j_tiles = sch.sample_perfect_tile(j, n=4)
+k_tiles = sch.sample_perfect_tile(k, n=2)
+# Tile the loops according to the random variables
+i_0, i_1, i_2, i_3 = sch.split(loop=i, factors=i_tiles)
+j_0, j_1, j_2, j_3 = sch.split(loop=j, factors=j_tiles)
+k_0, k_1           = sch.split(loop=k, factors=k_tiles)
+# Organize the loops into "SSRSRS" 6-level tiles
+sch.reorder(
+    i_0, j_0, # S: the 1st spatial tile
+    i_1, j_1, # S: the 2nd spatial tile
+    k_0,      # R: the 1st reduction tile
+    i_2, j_2, # S: the 3rd spatial tile
+    k_1,      # R: the 2nd reduction tile
+    i_3, j_3, # S: the 4th spatial tile
+)
+```
+
+### 3.4. AutoScheduler-style Design Space Generation
+
+AutoScheduler (Ansor) generates schedule templates by applying their SearchRules to each stage.
+SearchRule analyzes TE and eagerly trigger schedule primitives accordingly in its internally
+maintained mini IR.
+
+As introduced in Section 3.2, composite schedule rules are equivalent to AutoScheduler's SearchRule
+in TensorIR scheduling. To further generate a design space for scheduling, sampling instructions are
+used in composite schedule rules. Similarly, the sketch generation phase in AutoScheduler is
+equivalent to applying composite schedule rules to each block in TensorIR.
+
+Several built-in composite schedule rules are shipped with our system to align with Ansor's design
+space:
+
+* Multi-level tiling
+* Inline pure spatial blocks
+* Parallelize & vectorize & unroll
+* Auto tensorize
+
+Developers may implement their own rules in either Python or C++. They may specify which rules to
+use with the following syntax:
+
+```python
+from tvm import meta_schedule as ms
+
+design_space_generator = ms.PostOrderApply(rules=[
+    ms.MultiLevelTiling(...),
+    CustomRule(...),
+    ms.OtherBuiltinRules(...),
+])
+
+```
+
+### 3.5. Unifying manual schedule / AutoTVM / Ansor
+
+In this section, we show that the design space induced by TE manual schedule, AutoTVM and Ansor are
+all subsets of meta schedule, and meta schedule further allows mixing those three styles to search
+jointly.
+
+**Manual schedule**. The TE schedule is a special case of a meta schedule program, where there is no
+randomness introduced by sampling instructions. It is a single point in terms of design space.
+
+**AutoTVM (Template-based tuning)**. It is more natural representation of AutoTVM’s schedule
+templates (knobs) by writing one or more schedule functions in meta schedule with sampling
+instructions. The execution traces generated by the schedule functions are the design space to be
+explored.
+
+**AutoScheduler (Ansor, Template-free tuning)**. As mentioned in the previous section, application
+of composite schedule rules generates the design space, which is equivalent to Ansor’s sketch
+generation.
+
+**Mixing styles in design space definition**. By taking union of the spaces induced by the three
+special cases, our system allows developers to combine generic rules that Ansor provides and
+operator-specific scheduling.
+
+## 4. Reference-level explanation
+
+In this section, we introduce the underlying techniques for the automation system to extract and
+explore the design space. The figure below briefly illustrates the workflow of the system:
+
+![meta-schedule-workflow](../resources/meta-schedule-workflow.png)
+
+### 4.1. Execution trace as the design space
+
+**Trace**. To represent the design space defined by the meta schedule DSL, the underlying system
+records all the instructions users applied to the schedule class, including sampling and schedule
+primitives. We call this list of instructions a trace.
+
+For instance, executing the example above results in the following trace:
+
+```
+Instruction 0. Sample-Perfect-Tile
+Instruction 1. Sample-Perfect-Tile
+Instruction 2. Sample-Perfect-Tile
+Instruction 3. Split
+Instruction 4. Split
+Instruction 5. Split
+Instruction 6. Reorder
+```
+
+**Trace forms design space.** A trace may contain zero or more sampling instructions, which

Review comment:
       Yes it is certainly possible that the sampling result makes the schedule invalid.
   
   Example: we have a sampling instruction called `Sample-Compute-Location`, which finds opportunities to compute a block under one of all the possible loops - it is possible that it breaks some subsequent schedule primitives.
   
   So the point is it is almost certain that there are "bad points" in the design space, and our system will discard a schedule if any exception is thrown.
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tvm-rfcs] comaniac commented on a change in pull request #5: [RFC] Meta Schedule (AutoTIR)

Posted by GitBox <gi...@apache.org>.
comaniac commented on a change in pull request #5:
URL: https://github.com/apache/tvm-rfcs/pull/5#discussion_r663137508



##########
File path: rfcs/0001-meta-schedule-autotensorir.md
##########
@@ -0,0 +1,339 @@
+<!--- Licensed to the Apache Software Foundation (ASF) under one -->
+<!--- or more contributor license agreements.  See the NOTICE file -->
+<!--- distributed with this work for additional information -->
+<!--- regarding copyright ownership.  The ASF licenses this file -->
+<!--- to you under the Apache License, Version 2.0 (the -->
+<!--- "License"); you may not use this file except in compliance -->
+<!--- with the License.  You may obtain a copy of the License at -->
+
+<!---   http://www.apache.org/licenses/LICENSE-2.0 -->
+
+<!--- Unless required by applicable law or agreed to in writing, -->
+<!--- software distributed under the License is distributed on an -->
+<!--- "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY -->
+<!--- KIND, either express or implied.  See the License for the -->
+<!--- specific language governing permissions and limitations -->
+<!--- under the License. -->
+
+* Feature Name: Meta Schedule (AutoTIR)
+* Start Date: 2021-05-28
+* RFC PR: TBD (apache/tvm-rfcs#0000)
+* GitHub Issue: TBD (apache/tvm-rfcs#0000)
+
+## 1. Summary
+
+This proposal introduces Meta Schedule: a probabilistic scheduling DSL on TIR that unifies the approaches of AutoTVM and Auto Scheduler (Ansor). Meta schedule provides a pragmatic way to define the space of automatic tuning, extensibility in terms of all possible TIR schedule primitives like tensorization and loop partitioning, and customizability on every layer of the automation system.
+
+Meta Schedule is our 3rd generation automatic scheduling system.
+
+## 2. Motivation
+
+**Scheduling and Design Space**
+
+In TVM TensorIR, optimization of a TensorIR program is done via a sequence of transformations. For example, we reorder loops for better locality and we tensorize for specific hardware intrinsics. The process of invoking such a set of pre-defined transformations is called "**scheduling**", and each transformation is called a "**schedule primitive**". These primitives form a domain-specific language (DSL) describing the transformation of TensorIR programs. **Design space** is the set of all possible schedulings with respect to a TensorIR program.
+
+**Problems with the Current Scheduling System**
+
+* **Manual schedule**: Developers optimize their programs by manually invoking schedule primitives, i.e. explore points in the design space with humans in the loop. This can be a tedious and error-prone approach, hence the creation of AutoTVM and AutoScheduler (Ansor).
+* **AutoTVM**: The automation system requires users to define "schedule templates" as the design space for each operator. Therefore, it is inextensible to hundreds of operators and variety hardware platforms.
+* **AutoScheduler (Ansor)**: It automatically generates schedule templates as the design space, according to a set of predefined "search rules". However, it is non-trivial to extend AutoScheduler to new schedule primitives (tensorize, loop partition, software pipelining, etc).
+* The three systems above have isolated sets of APIs with several layers of their own abstraction, which are not only hard to learn, but also engineering-intensive to customize.
+
+**Benefit of Meta Schedule**
+
+* Succinct syntax, consistent APIs to TensorIR schedule with no other layer of abstraction.
+* Provides unified APIs for implementing manual schedule, AutoTVM and AutoScheduler (Ansor).
+* Extensibility to all the schedule primitives, including tensorization and loop partitioning. Almost no extra effort is needed to use a new primitive in auto-tuning.
+* The automation infrastructure is extensible on every of its components. Every component of the system can be customized easily in pure python or C++ or both. For example, one can develop a new design space generator in python, a new ProgramRunner in python, etc.
+
+
+## 3. Guide-level explanation
+
+In this section, we describe the syntax of meta schedule DSL, and how it could be used to describe and auto-generate the design space.
+
+### 3.1. Manual Schedule
+
+Meta schedule APIs are almost the same as TE or TensorIR scheduling. Here is an example of a manual schedule for matrix multiplication:
+
+```python
+# Designate a set of tile sizes
+i_tiles = [16, 8, 8, 8]
+j_tiles = [16, 8, 8, 8]
+k_tiles = [256, 8]
+
+# Tile the loops according to the tile sizes
+i_0, i_1, i_2, i_3 = sch.split(loop=i, factors=i_tiles)
+j_0, j_1, j_2, j_3 = sch.split(loop=j, factors=j_tiles)
+k_0, k_1           = sch.split(loop=k, factors=k_tiles)
+
+# Organize the loops into "SSRSRS" 6-level tiles
+sch.reorder(
+    i_0, j_0, # S: the 1st spatial tile
+    i_1, j_1, # S: the 2nd spatial tile
+    k_0,      # R: the 1st reduction tile
+    i_2, j_2, # S: the 3rd spatial tile
+    k_1,      # R: the 2nd reduction tile
+    i_3, j_3, # S: the 4th spatial tile
+)
+```
+
+In this example, the developers may tweak the tile sizes and measure the performance of the generated kernels to explore the opportunities of potential optimization.
+
+Generally speaking, while writing a schedule, there are often some parameters that are hard to determine ahead of time, for example, tile sizes, unroll steps, or which tensor intrinsics to use. Developers may manually enumerate possible combinations of these unknown factors, and then pick the best schedule according to measurement results on their device.
+
+### 3.2. Composite Schedule Rules
+
+As introduced in the previous section, in TensorIR, each schedule primitive handles only a very basic transformation of the IR, for example, `split` only splits a loop into two. In the real world, the over-fine granularity of those primitives usually leads to repetitive and verbose scheduling code, as [mentioned](https://discuss.tvm.apache.org/t/rfc-tensorir-a-schedulable-ir-for-tvm/7872/43?u=junrushao1994) by developers in our community.
+
+To make it more convenient and modular, we allow users to register "composite schedules" that apply a sequence of schedule primitives according to certain analysis of the IR. For instance, a composite schedule may inspect a TensorIR block and decide whether we should call `compute_inline` on it.
+
+### 3.3. AutoTVM-style Design Space Description
+
+Meta schedule extends the schedule DSL with sampling instructions. When included in a schedule, these instructions parametrize the schedule from a single deterministic point to a space supported by random variables (tile size, etc.), making it possible for developers to describe the design space with meta schedule APIs.
+
+We can extend the matmul example above to cover all possible tilings using these sampling instructions:
+
+```python
+# Sample tile sizes
+i_tiles = sch.sample_perfect_tile(i, n=4)
+j_tiles = sch.sample_perfect_tile(j, n=4)
+k_tiles = sch.sample_perfect_tile(k, n=2)
+# Tile the loops according to the random variables
+i_0, i_1, i_2, i_3 = sch.split(loop=i, factors=i_tiles)
+j_0, j_1, j_2, j_3 = sch.split(loop=j, factors=j_tiles)
+k_0, k_1           = sch.split(loop=k, factors=k_tiles)
+# Organize the loops into "SSRSRS" 6-level tiles
+sch.reorder(
+    i_0, j_0, # S: the 1st spatial tile
+    i_1, j_1, # S: the 2nd spatial tile
+    k_0,      # R: the 1st reduction tile
+    i_2, j_2, # S: the 3rd spatial tile
+    k_1,      # R: the 2nd reduction tile
+    i_3, j_3, # S: the 4th spatial tile
+)
+```
+
+### 3.4. AutoScheduler-style Design Space Generation
+
+AutoScheduler (Ansor) generates schedule templates by applying their SearchRules to each stage. SearchRule analyzes TE and eagerly trigger schedule primitives accordingly in its internally maintained mini IR.
+
+As introduced in Section 3.2, composite schedule rules are equivalent to AutoScheduler's SearchRule in TensorIR scheduling. To further generate a design space for scheduling, sampling instructions are used in composite schedule rules. Similarly, the sketch generation phase in AutoScheduler is equivalent to applying composite schedule rules to each block in TensorIR.
+
+Several built-in composite schedule rules are shipped with our system to align with Ansor's design space:
+
+* Multi-level tiling
+* Inline pure spatial blocks
+* Parallelize & vectorize & unroll
+* Auto tensorize
+* ...
+
+Developers may implement their own rules in either Python or C++, and specify which rules to use with the following syntax:
+
+```python
+from tvm import meta_schedule as ms
+
+design_space_generator = ms.PostOrderApply(rules=[
+    ms.MultiLevelTiling(...),
+    CustomRule(...),
+    ms.OtherBuiltinRules(...),
+])
+
+```
+
+### 3.5. Unifying manual schedule / AutoTVM / Ansor
+
+In this section, we show that the design space induced by TE manual schedule, AutoTVM and Ansor are all subsets of meta schedule, and meta schedule further allows mixing those three styles to search jointly.
+
+**Manual schedule**. The TE schedule is a special case of a meta schedule program, where there is no randomness introduced by sampling instructions. It is a single point in terms of design space.
+
+**AutoTVM (Template-based tuning)**. It is more natural representation of AutoTVM’s schedule templates (knobs) by writing one or more schedule functions in meta schedule with sampling instructions. The execution traces generated by the schedule functions are the design space to be explored.
+
+**AutoScheduler (Ansor, Template-free tuning)**. As mentioned in the previous section, application of composite schedule rules generates the design space, which is equivalent to Ansor’s sketch generation.
+
+**Mixing styles in design space definition**. By taking union of the spaces induced by the three special cases, our system allows developers to combine generic rules that Ansor provides and operator-specific scheduling.
+
+## 4. Reference-level explanation
+
+In this section, we introduce the underlying techniques for the automation system to extract and explore the design space. The figure below briefly illustrates the workflow of the system:
+
+![meta-schedule-workflow](../resources/meta-schedule-workflow.png)
+
+### 4.1. Execution trace as the design space
+
+**Trace**. To represent the design space defined by the meta schedule DSL, the underlying system records all the instructions users applied to the schedule class, including sampling and schedule primitives. We call this list of instructions a trace.
+
+Executing the example above results in the following trace:
+
+```
+Instruction 0. Sample-Perfect-Tile
+Instruction 1. Sample-Perfect-Tile
+Instruction 2. Sample-Perfect-Tile
+Instruction 3. Split
+Instruction 4. Split
+Instruction 5. Split
+Instruction 6. Reorder
+```
+
+**Trace forms design space.** A trace may contain zero or more sampling instructions, which introduces the uncertainty in scheduling - one instance of sampling results in one point in the design space. Therefore, the trace itself forms a design space to explore, e.g. which set of tile sizes works best on a specific hardware.
+
+**Union of design space**. Our system works on a set of traces, representing the union of the design spaces represented by every single trace.
+
+**Fork a trace**. When two different decisions in the scheduling process are equally important to generate high-performance schedules, we allow forking the trace into two, and the design space is the union of the forked traces.
+
+The trace is not strictly user-facing, but can be accessed and printed with the following syntax:
+
+```python
+# requires to trace the execution
+sch = tir.Schedule(..., traced=True)
+# do a lot of scheduling
+...
+# print the trace
+print(sch.trace)
+```
+
+And below is an example of the printed trace, which honestly reflects the schedule as a snippet of python scheduling function:
+
+```python
+b0 = sch.get_block(name="matmul", func_name="main")
+l1, l2, l3 = sch.get_loops(block=b0)
+v4, v5, v6, v7 = sch.sample_perfect_tile(loop=l1, n=4, max_innermost_factor=16, decision=[32, 1, 16, 2])
+v8, v9, v10, v11 = sch.sample_perfect_tile(loop=l2, n=4, max_innermost_factor=16, decision=[64, 4, 2, 2])
+v12, v13 = sch.sample_perfect_tile(loop=l3, n=2, max_innermost_factor=16, decision=[64, 16])
+l14, l15, l16, l17 = sch.split(loop=l1, factors=[v4, v5, v6, v7])
+l18, l19, l20, l21 = sch.split(loop=l2, factors=[v8, v9, v10, v11])
+l22, l23 = sch.split(loop=l3, factors=[v12, v13])
+sch.reorder(l14, l18, l15, l19, l22, l16, l20, l23, l17, l21)
+```
+
+### 4.2. Exploring the Search Space
+
+Meta Schedule provides several built-in exploration strategies to exhaustively or efficiently search for efficient schedules.
+
+**Program replay**. A simple strategy that replays the user-provided schedule function without using or taking any advantage of trace.

Review comment:
       Maybe further clarify a bit, saying the user-provided schedule function is complete, meaning that all values such as tile sizes are already specified?

##########
File path: rfcs/0001-meta-schedule-autotensorir.md
##########
@@ -0,0 +1,339 @@
+<!--- Licensed to the Apache Software Foundation (ASF) under one -->
+<!--- or more contributor license agreements.  See the NOTICE file -->
+<!--- distributed with this work for additional information -->
+<!--- regarding copyright ownership.  The ASF licenses this file -->
+<!--- to you under the Apache License, Version 2.0 (the -->
+<!--- "License"); you may not use this file except in compliance -->
+<!--- with the License.  You may obtain a copy of the License at -->
+
+<!---   http://www.apache.org/licenses/LICENSE-2.0 -->
+
+<!--- Unless required by applicable law or agreed to in writing, -->
+<!--- software distributed under the License is distributed on an -->
+<!--- "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY -->
+<!--- KIND, either express or implied.  See the License for the -->
+<!--- specific language governing permissions and limitations -->
+<!--- under the License. -->
+
+* Feature Name: Meta Schedule (AutoTIR)
+* Start Date: 2021-05-28
+* RFC PR: TBD (apache/tvm-rfcs#0000)
+* GitHub Issue: TBD (apache/tvm-rfcs#0000)
+
+## 1. Summary
+
+This proposal introduces Meta Schedule: a probabilistic scheduling DSL on TIR that unifies the approaches of AutoTVM and Auto Scheduler (Ansor). Meta schedule provides a pragmatic way to define the space of automatic tuning, extensibility in terms of all possible TIR schedule primitives like tensorization and loop partitioning, and customizability on every layer of the automation system.
+
+Meta Schedule is our 3rd generation automatic scheduling system.
+
+## 2. Motivation
+
+**Scheduling and Design Space**
+
+In TVM TensorIR, optimization of a TensorIR program is done via a sequence of transformations. For example, we reorder loops for better locality and we tensorize for specific hardware intrinsics. The process of invoking such a set of pre-defined transformations is called "**scheduling**", and each transformation is called a "**schedule primitive**". These primitives form a domain-specific language (DSL) describing the transformation of TensorIR programs. **Design space** is the set of all possible schedulings with respect to a TensorIR program.
+
+**Problems with the Current Scheduling System**
+
+* **Manual schedule**: Developers optimize their programs by manually invoking schedule primitives, i.e. explore points in the design space with humans in the loop. This can be a tedious and error-prone approach, hence the creation of AutoTVM and AutoScheduler (Ansor).
+* **AutoTVM**: The automation system requires users to define "schedule templates" as the design space for each operator. Therefore, it is inextensible to hundreds of operators and variety hardware platforms.
+* **AutoScheduler (Ansor)**: It automatically generates schedule templates as the design space, according to a set of predefined "search rules". However, it is non-trivial to extend AutoScheduler to new schedule primitives (tensorize, loop partition, software pipelining, etc).
+* The three systems above have isolated sets of APIs with several layers of their own abstraction, which are not only hard to learn, but also engineering-intensive to customize.
+
+**Benefit of Meta Schedule**
+
+* Succinct syntax, consistent APIs to TensorIR schedule with no other layer of abstraction.
+* Provides unified APIs for implementing manual schedule, AutoTVM and AutoScheduler (Ansor).
+* Extensibility to all the schedule primitives, including tensorization and loop partitioning. Almost no extra effort is needed to use a new primitive in auto-tuning.
+* The automation infrastructure is extensible on every of its components. Every component of the system can be customized easily in pure python or C++ or both. For example, one can develop a new design space generator in python, a new ProgramRunner in python, etc.
+
+
+## 3. Guide-level explanation
+
+In this section, we describe the syntax of meta schedule DSL, and how it could be used to describe and auto-generate the design space.
+
+### 3.1. Manual Schedule
+
+Meta schedule APIs are almost the same as TE or TensorIR scheduling. Here is an example of a manual schedule for matrix multiplication:
+
+```python
+# Designate a set of tile sizes
+i_tiles = [16, 8, 8, 8]
+j_tiles = [16, 8, 8, 8]
+k_tiles = [256, 8]
+
+# Tile the loops according to the tile sizes
+i_0, i_1, i_2, i_3 = sch.split(loop=i, factors=i_tiles)
+j_0, j_1, j_2, j_3 = sch.split(loop=j, factors=j_tiles)
+k_0, k_1           = sch.split(loop=k, factors=k_tiles)
+
+# Organize the loops into "SSRSRS" 6-level tiles
+sch.reorder(
+    i_0, j_0, # S: the 1st spatial tile
+    i_1, j_1, # S: the 2nd spatial tile
+    k_0,      # R: the 1st reduction tile
+    i_2, j_2, # S: the 3rd spatial tile
+    k_1,      # R: the 2nd reduction tile
+    i_3, j_3, # S: the 4th spatial tile
+)
+```
+
+In this example, the developers may tweak the tile sizes and measure the performance of the generated kernels to explore the opportunities of potential optimization.
+
+Generally speaking, while writing a schedule, there are often some parameters that are hard to determine ahead of time, for example, tile sizes, unroll steps, or which tensor intrinsics to use. Developers may manually enumerate possible combinations of these unknown factors, and then pick the best schedule according to measurement results on their device.
+
+### 3.2. Composite Schedule Rules
+
+As introduced in the previous section, in TensorIR, each schedule primitive handles only a very basic transformation of the IR, for example, `split` only splits a loop into two. In the real world, the over-fine granularity of those primitives usually leads to repetitive and verbose scheduling code, as [mentioned](https://discuss.tvm.apache.org/t/rfc-tensorir-a-schedulable-ir-for-tvm/7872/43?u=junrushao1994) by developers in our community.
+
+To make it more convenient and modular, we allow users to register "composite schedules" that apply a sequence of schedule primitives according to certain analysis of the IR. For instance, a composite schedule may inspect a TensorIR block and decide whether we should call `compute_inline` on it.
+
+### 3.3. AutoTVM-style Design Space Description
+
+Meta schedule extends the schedule DSL with sampling instructions. When included in a schedule, these instructions parametrize the schedule from a single deterministic point to a space supported by random variables (tile size, etc.), making it possible for developers to describe the design space with meta schedule APIs.
+
+We can extend the matmul example above to cover all possible tilings using these sampling instructions:
+
+```python
+# Sample tile sizes
+i_tiles = sch.sample_perfect_tile(i, n=4)
+j_tiles = sch.sample_perfect_tile(j, n=4)
+k_tiles = sch.sample_perfect_tile(k, n=2)
+# Tile the loops according to the random variables
+i_0, i_1, i_2, i_3 = sch.split(loop=i, factors=i_tiles)
+j_0, j_1, j_2, j_3 = sch.split(loop=j, factors=j_tiles)
+k_0, k_1           = sch.split(loop=k, factors=k_tiles)
+# Organize the loops into "SSRSRS" 6-level tiles
+sch.reorder(
+    i_0, j_0, # S: the 1st spatial tile
+    i_1, j_1, # S: the 2nd spatial tile
+    k_0,      # R: the 1st reduction tile
+    i_2, j_2, # S: the 3rd spatial tile
+    k_1,      # R: the 2nd reduction tile
+    i_3, j_3, # S: the 4th spatial tile
+)
+```
+
+### 3.4. AutoScheduler-style Design Space Generation
+
+AutoScheduler (Ansor) generates schedule templates by applying their SearchRules to each stage. SearchRule analyzes TE and eagerly trigger schedule primitives accordingly in its internally maintained mini IR.
+
+As introduced in Section 3.2, composite schedule rules are equivalent to AutoScheduler's SearchRule in TensorIR scheduling. To further generate a design space for scheduling, sampling instructions are used in composite schedule rules. Similarly, the sketch generation phase in AutoScheduler is equivalent to applying composite schedule rules to each block in TensorIR.
+
+Several built-in composite schedule rules are shipped with our system to align with Ansor's design space:
+
+* Multi-level tiling
+* Inline pure spatial blocks
+* Parallelize & vectorize & unroll
+* Auto tensorize
+* ...
+
+Developers may implement their own rules in either Python or C++, and specify which rules to use with the following syntax:
+
+```python
+from tvm import meta_schedule as ms
+
+design_space_generator = ms.PostOrderApply(rules=[
+    ms.MultiLevelTiling(...),
+    CustomRule(...),
+    ms.OtherBuiltinRules(...),
+])
+
+```
+
+### 3.5. Unifying manual schedule / AutoTVM / Ansor
+
+In this section, we show that the design space induced by TE manual schedule, AutoTVM and Ansor are all subsets of meta schedule, and meta schedule further allows mixing those three styles to search jointly.
+
+**Manual schedule**. The TE schedule is a special case of a meta schedule program, where there is no randomness introduced by sampling instructions. It is a single point in terms of design space.
+
+**AutoTVM (Template-based tuning)**. It is more natural representation of AutoTVM’s schedule templates (knobs) by writing one or more schedule functions in meta schedule with sampling instructions. The execution traces generated by the schedule functions are the design space to be explored.
+
+**AutoScheduler (Ansor, Template-free tuning)**. As mentioned in the previous section, application of composite schedule rules generates the design space, which is equivalent to Ansor’s sketch generation.
+
+**Mixing styles in design space definition**. By taking union of the spaces induced by the three special cases, our system allows developers to combine generic rules that Ansor provides and operator-specific scheduling.
+
+## 4. Reference-level explanation
+
+In this section, we introduce the underlying techniques for the automation system to extract and explore the design space. The figure below briefly illustrates the workflow of the system:
+
+![meta-schedule-workflow](../resources/meta-schedule-workflow.png)
+
+### 4.1. Execution trace as the design space
+
+**Trace**. To represent the design space defined by the meta schedule DSL, the underlying system records all the instructions users applied to the schedule class, including sampling and schedule primitives. We call this list of instructions a trace.
+
+Executing the example above results in the following trace:

Review comment:
       ```suggestion
   For instance, executing the example above results in the following trace:
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tvm-rfcs] junrushao1994 commented on a change in pull request #5: [RFC] Meta Schedule (AutoTIR)

Posted by GitBox <gi...@apache.org>.
junrushao1994 commented on a change in pull request #5:
URL: https://github.com/apache/tvm-rfcs/pull/5#discussion_r676808880



##########
File path: rfcs/0001-meta-schedule-autotensorir.md
##########
@@ -64,8 +76,17 @@ primitives form a domain-specific language (DSL) describing the transformation o
 
 ## 3. Guide-level explanation
 
-In this section, we describe the syntax of meta schedule DSL, and how it could be used to describe
-and auto-generate the design space.
+Meta Schedule DSL is a flexible way to define or auto-generate the design space.
+
+This section introduces its syntax of meta schedule DSL and usage in terms of describing and
+auto-generating the design space, more specifically, its APIs for:
+1) Manually constructing a schedule using existing schedule primitives (Section 3.1);
+2) Defining composite schedule to simplify the ap sequence of schedule primitives (Section 3.2);
+3) Describing a design space of possible schedules,
+a.k.a. AutoTVM-style schedule templates (Section 3.3);
+4) Automatically generating the design space, a.k.a. Ansor-style search rules (Section 3.4);

Review comment:
       Yeah I agree that we should just converge to a single name, i.e. AutoScheduler, given Ansor == AutoScheduler. I should put a citation somewhere so that the readers could find the Ansor paper more easily




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tvm-rfcs] junrushao1994 commented on pull request #5: [RFC] Meta Schedule (AutoTensorIR)

Posted by GitBox <gi...@apache.org>.
junrushao1994 commented on pull request #5:
URL: https://github.com/apache/tvm-rfcs/pull/5#issuecomment-857460287


   CC: @jroesch @icemelon9 @kparzysz-quic @FrozenGene @jcf94 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tvm-rfcs] junrushao1994 commented on a change in pull request #5: [RFC] Meta Schedule (AutoTensorIR)

Posted by GitBox <gi...@apache.org>.
junrushao1994 commented on a change in pull request #5:
URL: https://github.com/apache/tvm-rfcs/pull/5#discussion_r648032244



##########
File path: rfcs/0001-meta-schedule-autotensorir.md
##########
@@ -0,0 +1,297 @@
+* Feature Name: Meta Schedule (AutoTensorIR)
+* Start Date: 2021-05-28
+* RFC PR: TBD (apache/tvm-rfcs#0000)
+* GitHub Issue: TBD (apache/tvm-rfcs#0000)
+
+## 1. Summary
+
+This proposal introduces Meta Schedule: a probabilistic scheduling DSL on TIR that unifies the approaches of AutoTVM and Auto Scheduler (Ansor). Meta schedule provides a pragmatic way to define the space of automatic tuning, extensibility in terms of all possible TIR schedule primitives like tensorization and loop partitioning, and customizability on every layer of the automation system.
+
+Meta Schedule is our 3rd generation automatic scheduling system.
+
+## 2. Motivation
+
+**Scheduling and Design Space**
+
+In TVM TensorIR, optimization of a TensorIR program is done via a sequence of transformations. For example, we reorder loops for better locality and we tensorize for specific hardware intrinsics. The process of invoking such a set of pre-defined transformations is called “**scheduling**”, and each transformation is called a “**schedule primitive**”. These primitives form a domain-specific language (DSL) describing the transformation of TensorIR programs. **Design space** is the set of all possible schedulings with respect to a TensorIR program.
+
+**Problems with the Current Scheduling System**
+
+* **Manual schedule**: Developers optimize their programs by manually invoking schedule primitives, i.e. explore points in the design space with humans in the loop. This can be a tedious and error-prone approach, hence the creation of AutoTVM and AutoScheduler (Ansor).
+* **AutoTVM**: The automation system requires users to define “schedule templates” as the design space for each operator. Therefore, it is inextensible to hundreds of operators.
+* **AutoScheduler (Ansor)**: It automatically generates schedule templates as the design space, according to a set of predefined “search rules”. However, it is non-trivial to extend AutoScheduler to new schedule primitives (tensorize, loop partition, software pipelining).
+* The three systems above have isolated sets of APIs with several layers of their own abstraction, which are not only hard to learn, but also engineering-intensive to customize.
+
+**Benefit of Meta Schedule**
+
+* Succinct syntax, consistent APIs to TensorIR schedule with no other layer of abstraction.
+* Provides unified APIs for implementing manual schedule, AutoTVM and AutoScheduler (Ansor).
+* Extensibility to all the schedule primitives, including tensorization and loop partitioning. Almost no extra effort is needed to use a new primitive in auto-tuning.
+* The automation infrastructure is customizable across every layer.
+
+## 3. Guide-level explanation
+
+In this section, we describe the syntax of meta schedule DSL, and how it could be used to describe and auto-generate the design space.
+
+### 3.1. Manual Schedule
+
+Meta schedule APIs are almost the same as TE or TensorIR scheduling. Here is an example of a manual schedule for matrix multiplication:
+
+```python
+# Designate a set of tile sizes
+i_tiles = [16, 8, 8, 8]
+j_tiles = [16, 8, 8, 8]
+k_tiles = [256, 8]
+
+# Tile the loops according to the tile sizes
+i_0, i_1, i_2, i_3 = sch.split(loop=i, factors=i_tiles)
+j_0, j_1, j_2, j_3 = sch.split(loop=j, factors=j_tiles)
+k_0, k_1           = sch.split(loop=k, factors=k_tiles)
+
+# Organize the loops into “SSRSRS” 6-level tiles
+sch.reorder(
+    i_0, j_0, # S
+    i_1, j_1, # S
+    k_0,      # R
+    i_2, j_2, # S
+    k_1,      # R
+    i_3, j_3, # S
+)
+```
+
+In this example, the developers may tweak the tile sizes and measure the performance of the generated kernels to explore the opportunities of potential optimization.
+
+Generally speaking, while writing a schedule, there are often some parameters that are hard to determine ahead of time, for example, tile sizes, unroll steps, or which tensor intrinsics to use. Developers may manually enumerate possible combinations of these unknown factors, and then pick the best schedule according to measurement results on their device.
+
+### 3.2. AutoTVM-style Design Space Description
+
+Meta schedule extends the schedule DSL with sampling instructions. When included in a schedule, these instructions parametrize the schedule from a single deterministic point to a space supported by random variables (tile size, etc.), making it possible for developers to describe the design space with meta schedule APIs.
+
+We can extend the matmul example above to cover all possible tilings using these sampling instructions:
+
+```python
+# Sample tile sizes
+i_tiles = sch.sample_perfect_tile(i, n=4)
+j_tiles = sch.sample_perfect_tile(j, n=4)
+k_tiles = sch.sample_perfect_tile(k, n=2)
+# Tile the loops according to the random variables
+i_0, i_1, i_2, i_3 = sch.split(loop=i, factors=i_tiles)
+j_0, j_1, j_2, j_3 = sch.split(loop=j, factors=j_tiles)
+k_0, k_1           = sch.split(loop=k, factors=k_tiles)
+# Organize the loops into “SSRSRS” 6-level tiles
+sch.reorder(
+    i_0, j_0, # S
+    i_1, j_1, # S
+    k_0,      # R
+    i_2, j_2, # S
+    k_1,      # R
+    i_3, j_3, # S
+)
+```
+
+### 3.3. Composite Schedule
+
+Each schedule primitive handles only a very basic operation to transform the IR, for example, `split` only splits a loop into two. In the real world, the over-fine granularity of those primitives usually leads to repetitive and verbose scheduling code, as [mentioned](https://discuss.tvm.apache.org/t/rfc-tensorir-a-schedulable-ir-for-tvm/7872/43?u=junrushao1994) by developers in our community.
+
+To counter this challenge, we allow users to register “composite schedules” that analyze the IR, and apply a set of schedule primitives correspondingly. For instance, a composite schedule may inspect a TensorIR block and decide whether we should call `compute_inline` on it. The composite schedule may use sampling instructions to fill in undecided choices.
+
+Our system also ships with some built-in composite schedules, including:
+
+* Multi-level tiling
+* Inline pure spatial blocks
+* Parallelize & vectorize & unroll
+* Auto tensorize
+* …
+
+### 3.4. AutoScheduler-style Design Space Generation
+
+AutoScheduler (Ansor) generates schedule templates by applying their SearchRules to each stage. Meta schedule treats a search rule as a composite schedule, and applies each composite schedule to each block of TensorIR to generate the design space.
+
+### 3.5. Unifying manual schedule / AutoTVM / Ansor
+
+In this section, we show that the design space induced by TE manual schedule, AutoTVM and Ansor are all subsets of meta schedule, and meta schedule further allows mixing those three styles to search jointly.
+
+**Manual schedule**. The TE schedule is a special case of a meta schedule program, where there is no randomness introduced by sampling instructions. It is a single point in terms of design space.
+
+**AutoTVM (Template-based tuning)**. Writing one or more schedule functions in meta schedule, potentially with sampling instructions, is a natural representation of AutoTVM’s schedule templates (knobs). The PPL generates one or more traces as the design space to explore.
+
+**AutoScheduler (Ansor, Template-free tuning)**. As mentioned in the previous section, application of composite schedule rules generates the design space, which is equivalent to Ansor’s sketch generation.
+
+**Mixing styles in design space definition**. By taking union of the spaces induced by the three special cases, our system allows developers to combine generic rules that Ansor provides and operator-specific scheduling.
+
+## 4. Reference-level explanation
+
+In this section, we introduce the underlying techniques for the automation system to extract and explore the design space.
+
+### 4.1. Execution trace as the design space
+
+**Trace**. To represent the design space defined by the meta schedule DSL, the underlying system records all the instructions users applied to the schedule class, including sampling and schedule primitives. We call this list of instructions a trace.
+
+Executing the example above results in the following trace:
+
+```
+Instruction 0. Sample-Perfect-Tile
+Instruction 1. Sample-Perfect-Tile
+Instruction 2. Sample-Perfect-Tile
+Instruction 3. Split
+Instruction 4. Split
+Instruction 5. Split
+Instruction 6. Reorder
+```
+
+The trace is not directly user-facing, but a data structure inside the user-facing `Schedule` class that records the execution. The automation system extracts the trace and finds out the design space according to the sampling instructions.
+
+**Union of traces**. Often a single trace is unable to represent the entire space. Therefore, more precisely, our system works on a list of traces as the union of potential design space.

Review comment:
       > Are you saying a design space is composite of many traces?
   
   It is like Ansor which deals with several sketches. Each trace is a design space. Union of traces is like union of several design spaces, and it is still a design space.
   
   On the second paragraph, would you like to elaborate a little bit? In my understanding, "explored points" are like a specific random choice determined of a trace, while "design space" is like a set of discrete points.
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tvm-rfcs] junrushao1994 commented on a change in pull request #5: [RFC] Meta Schedule (AutoTIR)

Posted by GitBox <gi...@apache.org>.
junrushao1994 commented on a change in pull request #5:
URL: https://github.com/apache/tvm-rfcs/pull/5#discussion_r668319732



##########
File path: rfcs/0001-meta-schedule-autotensorir.md
##########
@@ -0,0 +1,444 @@
+<!--- Licensed to the Apache Software Foundation (ASF) under one -->
+<!--- or more contributor license agreements.  See the NOTICE file -->
+<!--- distributed with this work for additional information -->
+<!--- regarding copyright ownership.  The ASF licenses this file -->
+<!--- to you under the Apache License, Version 2.0 (the -->
+<!--- "License"); you may not use this file except in compliance -->
+<!--- with the License.  You may obtain a copy of the License at -->
+
+<!---   http://www.apache.org/licenses/LICENSE-2.0 -->
+
+<!--- Unless required by applicable law or agreed to in writing, -->
+<!--- software distributed under the License is distributed on an -->
+<!--- "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY -->
+<!--- KIND, either express or implied.  See the License for the -->
+<!--- specific language governing permissions and limitations -->
+<!--- under the License. -->
+
+* Feature Name: Meta Schedule (Formerly AutoTIR)
+* Start Date: 2021-05-28
+* RFC PR: TBD (apache/tvm-rfcs#0000)
+* GitHub Issue: TBD (apache/tvm-rfcs#0000)
+
+## 1. Summary
+
+This proposal introduces Meta Schedule: a probabilistic scheduling DSL on TIR that unifies the
+approaches of AutoTVM and Auto Scheduler (Ansor). Meta schedule provides a pragmatic way to define
+the space of automatic tuning, extensibility in terms of all possible TIR schedule primitives like
+tensorization and loop partitioning, and customizability on every layer of the automation system.
+
+Meta Schedule is our 3rd generation automatic scheduling system.
+
+## 2. Motivation
+
+**Scheduling and design space.** In TVM TensorIR, optimization of a TensorIR program is done via a
+sequence of transformations. For example, we reorder loops for better locality and we tensorize for
+specific hardware intrinsics. The process of invoking such a set of pre-defined transformations is
+called "**scheduling**", and each transformation is called a "**schedule primitive**". These
+primitives form a domain-specific language (DSL) describing the transformation of TensorIR programs.
+**Design space** is the set of all possible schedulings with respect to a TensorIR program.
+
+**Problems with the current scheduling system.** Currently we have 3 sets of scheduling APIs:
+* **Manual schedule**: Developers optimize their programs by manually invoking schedule primitives,
+  i.e. explore points in the design space with humans in the loop. This can be a tedious and
+  error-prone approach, hence the creation of AutoTVM and AutoScheduler (Ansor).
+* **AutoTVM**: The automation system requires users to define "schedule templates" as the design
+  space for each operator. Therefore, it is inextensible to hundreds of operators and variety
+  hardware platforms.
+* **AutoScheduler (Ansor)**: It automatically generates schedule templates as the design space,
+  according to a set of predefined "search rules". However, it is non-trivial to extend
+  AutoScheduler to new schedule primitives (tensorize, loop partition, software pipelining, etc).
+* The three systems above have isolated sets of APIs with several layers of their own abstraction,
+  which are not only hard to learn, but also engineering-intensive to customize.
+
+**Benefits of Meta Schedule.**  Meta schedule provides:
+* Succinct syntax, consistent APIs to TensorIR schedule with no other layer of abstraction.
+* Unified APIs for implementing manual schedules, AutoTVM-style schedules, and AutoScheduler-style
+  schedules.
+* Extensibility of all the schedule primitives, including tensorization and loop partitioning.
+  Almost no extra effort is needed to use a new primitive in auto-tuning.
+* The automation infrastructure is extensible on every of its components. Every component of the
+  system can be customized easily in pure python or C++ or both. For example, one can develop a new
+  design space generator in python, a new ProgramRunner in python, etc.
+
+
+## 3. Guide-level explanation
+
+In this section, we describe the syntax of meta schedule DSL, and how it could be used to describe
+and auto-generate the design space.
+
+### 3.1. Manual Schedule
+
+Meta schedule APIs are almost the same as TE or TensorIR scheduling. Here is an example of a manual
+schedule for matrix multiplication:
+
+```python
+# Designate a set of tile sizes
+i_tiles = [16, 8, 8, 8]
+j_tiles = [16, 8, 8, 8]
+k_tiles = [256, 8]
+
+# Tile the loops according to the tile sizes
+i_0, i_1, i_2, i_3 = sch.split(loop=i, factors=i_tiles)
+j_0, j_1, j_2, j_3 = sch.split(loop=j, factors=j_tiles)
+k_0, k_1           = sch.split(loop=k, factors=k_tiles)
+
+# Organize the loops into "SSRSRS" 6-level tiles
+sch.reorder(
+    i_0, j_0, # S: the 1st spatial tile
+    i_1, j_1, # S: the 2nd spatial tile
+    k_0,      # R: the 1st reduction tile
+    i_2, j_2, # S: the 3rd spatial tile
+    k_1,      # R: the 2nd reduction tile
+    i_3, j_3, # S: the 4th spatial tile
+)
+```
+
+In this example, the developers may tweak the tile sizes and measure the performance of the
+generated kernels to explore the opportunities of potential optimization.
+
+Generally speaking, while writing a schedule, there are often some parameters that are hard to
+determine ahead of time, for example, tile sizes, unroll steps, or which tensor intrinsics to use.
+Developers may manually enumerate possible combinations of these unknown factors, and then pick the
+best schedule according to measurement results on their device.
+
+### 3.2. Composite Schedule Rules
+
+As introduced in the previous section, in TensorIR, each schedule primitive handles only a very
+basic transformation of the IR. For example, `split` only splits a loop into two new loops. In the
+real world, the over-fine granularity of those primitives usually leads to repetitive and verbose
+scheduling code, as
+[mentioned](https://discuss.tvm.apache.org/t/rfc-tensorir-a-schedulable-ir-for-tvm/7872/43?u=junrushao1994)
+by developers in our community.
+
+To make it more convenient and modular, we allow users to register "composite schedules" that apply
+a sequence of schedule primitives according to certain analysis of the IR. For instance, a composite
+schedule may inspect a TensorIR block and decide whether we should call `compute_inline` on it.
+
+### 3.3. AutoTVM-style Design Space Description
+
+Meta schedule extends the schedule DSL with sampling instructions. When included in a schedule,
+these instructions parametrize the schedule from a single deterministic point to a space supported
+by random variables (tile size, etc.), making it possible for developers to describe the design
+space with meta schedule APIs.
+
+We can extend the matmul example above to cover all possible tilings using these sampling
+instructions:
+
+```python
+# Sample tile sizes
+i_tiles = sch.sample_perfect_tile(i, n=4)
+j_tiles = sch.sample_perfect_tile(j, n=4)
+k_tiles = sch.sample_perfect_tile(k, n=2)
+# Tile the loops according to the random variables
+i_0, i_1, i_2, i_3 = sch.split(loop=i, factors=i_tiles)
+j_0, j_1, j_2, j_3 = sch.split(loop=j, factors=j_tiles)
+k_0, k_1           = sch.split(loop=k, factors=k_tiles)
+# Organize the loops into "SSRSRS" 6-level tiles
+sch.reorder(
+    i_0, j_0, # S: the 1st spatial tile
+    i_1, j_1, # S: the 2nd spatial tile
+    k_0,      # R: the 1st reduction tile
+    i_2, j_2, # S: the 3rd spatial tile
+    k_1,      # R: the 2nd reduction tile
+    i_3, j_3, # S: the 4th spatial tile
+)
+```
+
+### 3.4. AutoScheduler-style Design Space Generation
+
+AutoScheduler (Ansor) generates schedule templates by applying their SearchRules to each stage.
+SearchRule analyzes TE and eagerly trigger schedule primitives accordingly in its internally
+maintained mini IR.
+
+As introduced in Section 3.2, composite schedule rules are equivalent to AutoScheduler's SearchRule
+in TensorIR scheduling. To further generate a design space for scheduling, sampling instructions are

Review comment:
       Revised as the following:
   
   ```markdown
   AutoScheduler (Ansor) generates schedule templates by applying a set of **SearchRule** to each stage.
   SearchRule analyzes TE and eagerly trigger schedule primitives accordingly in its internally
   maintained mini IR.
   
   Composite schedule rules work in a similar way scheduling TensorIR, as introduced in Section 3.2.
   It analyzes the TensorIR and apply schedule primitives directly to TensorIR accordingly.
   When applying such rules to each TensorIR block in certain order (e.g. Post-DFS Order),
   it generates a sequence of schedule primitives.
   If the sampling instructions are present in this sequence, 
   the support of the probability space form a design space of possible schedulings.
   This process is similar to the *sketch generation* phase in AutoScheduler.
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tvm-rfcs] tkonolige commented on a change in pull request #5: [RFC] Meta Schedule (AutoTIR)

Posted by GitBox <gi...@apache.org>.
tkonolige commented on a change in pull request #5:
URL: https://github.com/apache/tvm-rfcs/pull/5#discussion_r662453753



##########
File path: rfcs/0001-meta-schedule-autotensorir.md
##########
@@ -0,0 +1,339 @@
+<!--- Licensed to the Apache Software Foundation (ASF) under one -->
+<!--- or more contributor license agreements.  See the NOTICE file -->
+<!--- distributed with this work for additional information -->
+<!--- regarding copyright ownership.  The ASF licenses this file -->
+<!--- to you under the Apache License, Version 2.0 (the -->
+<!--- "License"); you may not use this file except in compliance -->
+<!--- with the License.  You may obtain a copy of the License at -->
+
+<!---   http://www.apache.org/licenses/LICENSE-2.0 -->
+
+<!--- Unless required by applicable law or agreed to in writing, -->
+<!--- software distributed under the License is distributed on an -->
+<!--- "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY -->
+<!--- KIND, either express or implied.  See the License for the -->
+<!--- specific language governing permissions and limitations -->
+<!--- under the License. -->
+
+* Feature Name: Meta Schedule (AutoTIR)
+* Start Date: 2021-05-28
+* RFC PR: TBD (apache/tvm-rfcs#0000)
+* GitHub Issue: TBD (apache/tvm-rfcs#0000)
+
+## 1. Summary
+
+This proposal introduces Meta Schedule: a probabilistic scheduling DSL on TIR that unifies the approaches of AutoTVM and Auto Scheduler (Ansor). Meta schedule provides a pragmatic way to define the space of automatic tuning, extensibility in terms of all possible TIR schedule primitives like tensorization and loop partitioning, and customizability on every layer of the automation system.
+
+Meta Schedule is our 3rd generation automatic scheduling system.
+
+## 2. Motivation
+
+**Scheduling and Design Space**
+
+In TVM TensorIR, optimization of a TensorIR program is done via a sequence of transformations. For example, we reorder loops for better locality and we tensorize for specific hardware intrinsics. The process of invoking such a set of pre-defined transformations is called "**scheduling**", and each transformation is called a "**schedule primitive**". These primitives form a domain-specific language (DSL) describing the transformation of TensorIR programs. **Design space** is the set of all possible schedulings with respect to a TensorIR program.
+
+**Problems with the Current Scheduling System**
+
+* **Manual schedule**: Developers optimize their programs by manually invoking schedule primitives, i.e. explore points in the design space with humans in the loop. This can be a tedious and error-prone approach, hence the creation of AutoTVM and AutoScheduler (Ansor).
+* **AutoTVM**: The automation system requires users to define "schedule templates" as the design space for each operator. Therefore, it is inextensible to hundreds of operators and variety hardware platforms.
+* **AutoScheduler (Ansor)**: It automatically generates schedule templates as the design space, according to a set of predefined "search rules". However, it is non-trivial to extend AutoScheduler to new schedule primitives (tensorize, loop partition, software pipelining, etc).
+* The three systems above have isolated sets of APIs with several layers of their own abstraction, which are not only hard to learn, but also engineering-intensive to customize.
+
+**Benefit of Meta Schedule**

Review comment:
       ```suggestion
   **Benefits of Meta Schedule**
   ```

##########
File path: rfcs/0001-meta-schedule-autotensorir.md
##########
@@ -0,0 +1,339 @@
+<!--- Licensed to the Apache Software Foundation (ASF) under one -->
+<!--- or more contributor license agreements.  See the NOTICE file -->
+<!--- distributed with this work for additional information -->
+<!--- regarding copyright ownership.  The ASF licenses this file -->
+<!--- to you under the Apache License, Version 2.0 (the -->
+<!--- "License"); you may not use this file except in compliance -->
+<!--- with the License.  You may obtain a copy of the License at -->
+
+<!---   http://www.apache.org/licenses/LICENSE-2.0 -->
+
+<!--- Unless required by applicable law or agreed to in writing, -->
+<!--- software distributed under the License is distributed on an -->
+<!--- "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY -->
+<!--- KIND, either express or implied.  See the License for the -->
+<!--- specific language governing permissions and limitations -->
+<!--- under the License. -->
+
+* Feature Name: Meta Schedule (AutoTIR)
+* Start Date: 2021-05-28
+* RFC PR: TBD (apache/tvm-rfcs#0000)
+* GitHub Issue: TBD (apache/tvm-rfcs#0000)
+
+## 1. Summary
+
+This proposal introduces Meta Schedule: a probabilistic scheduling DSL on TIR that unifies the approaches of AutoTVM and Auto Scheduler (Ansor). Meta schedule provides a pragmatic way to define the space of automatic tuning, extensibility in terms of all possible TIR schedule primitives like tensorization and loop partitioning, and customizability on every layer of the automation system.
+
+Meta Schedule is our 3rd generation automatic scheduling system.
+
+## 2. Motivation
+
+**Scheduling and Design Space**
+
+In TVM TensorIR, optimization of a TensorIR program is done via a sequence of transformations. For example, we reorder loops for better locality and we tensorize for specific hardware intrinsics. The process of invoking such a set of pre-defined transformations is called "**scheduling**", and each transformation is called a "**schedule primitive**". These primitives form a domain-specific language (DSL) describing the transformation of TensorIR programs. **Design space** is the set of all possible schedulings with respect to a TensorIR program.
+
+**Problems with the Current Scheduling System**
+
+* **Manual schedule**: Developers optimize their programs by manually invoking schedule primitives, i.e. explore points in the design space with humans in the loop. This can be a tedious and error-prone approach, hence the creation of AutoTVM and AutoScheduler (Ansor).
+* **AutoTVM**: The automation system requires users to define "schedule templates" as the design space for each operator. Therefore, it is inextensible to hundreds of operators and variety hardware platforms.
+* **AutoScheduler (Ansor)**: It automatically generates schedule templates as the design space, according to a set of predefined "search rules". However, it is non-trivial to extend AutoScheduler to new schedule primitives (tensorize, loop partition, software pipelining, etc).
+* The three systems above have isolated sets of APIs with several layers of their own abstraction, which are not only hard to learn, but also engineering-intensive to customize.
+
+**Benefit of Meta Schedule**
+
+* Succinct syntax, consistent APIs to TensorIR schedule with no other layer of abstraction.
+* Provides unified APIs for implementing manual schedule, AutoTVM and AutoScheduler (Ansor).
+* Extensibility to all the schedule primitives, including tensorization and loop partitioning. Almost no extra effort is needed to use a new primitive in auto-tuning.
+* The automation infrastructure is extensible on every of its components. Every component of the system can be customized easily in pure python or C++ or both. For example, one can develop a new design space generator in python, a new ProgramRunner in python, etc.
+
+
+## 3. Guide-level explanation
+
+In this section, we describe the syntax of meta schedule DSL, and how it could be used to describe and auto-generate the design space.
+
+### 3.1. Manual Schedule
+
+Meta schedule APIs are almost the same as TE or TensorIR scheduling. Here is an example of a manual schedule for matrix multiplication:
+
+```python
+# Designate a set of tile sizes
+i_tiles = [16, 8, 8, 8]
+j_tiles = [16, 8, 8, 8]
+k_tiles = [256, 8]
+
+# Tile the loops according to the tile sizes
+i_0, i_1, i_2, i_3 = sch.split(loop=i, factors=i_tiles)
+j_0, j_1, j_2, j_3 = sch.split(loop=j, factors=j_tiles)
+k_0, k_1           = sch.split(loop=k, factors=k_tiles)
+
+# Organize the loops into "SSRSRS" 6-level tiles
+sch.reorder(
+    i_0, j_0, # S: the 1st spatial tile
+    i_1, j_1, # S: the 2nd spatial tile
+    k_0,      # R: the 1st reduction tile
+    i_2, j_2, # S: the 3rd spatial tile
+    k_1,      # R: the 2nd reduction tile
+    i_3, j_3, # S: the 4th spatial tile
+)
+```
+
+In this example, the developers may tweak the tile sizes and measure the performance of the generated kernels to explore the opportunities of potential optimization.
+
+Generally speaking, while writing a schedule, there are often some parameters that are hard to determine ahead of time, for example, tile sizes, unroll steps, or which tensor intrinsics to use. Developers may manually enumerate possible combinations of these unknown factors, and then pick the best schedule according to measurement results on their device.
+
+### 3.2. Composite Schedule Rules
+
+As introduced in the previous section, in TensorIR, each schedule primitive handles only a very basic transformation of the IR, for example, `split` only splits a loop into two. In the real world, the over-fine granularity of those primitives usually leads to repetitive and verbose scheduling code, as [mentioned](https://discuss.tvm.apache.org/t/rfc-tensorir-a-schedulable-ir-for-tvm/7872/43?u=junrushao1994) by developers in our community.
+
+To make it more convenient and modular, we allow users to register "composite schedules" that apply a sequence of schedule primitives according to certain analysis of the IR. For instance, a composite schedule may inspect a TensorIR block and decide whether we should call `compute_inline` on it.
+
+### 3.3. AutoTVM-style Design Space Description
+
+Meta schedule extends the schedule DSL with sampling instructions. When included in a schedule, these instructions parametrize the schedule from a single deterministic point to a space supported by random variables (tile size, etc.), making it possible for developers to describe the design space with meta schedule APIs.
+
+We can extend the matmul example above to cover all possible tilings using these sampling instructions:
+
+```python
+# Sample tile sizes
+i_tiles = sch.sample_perfect_tile(i, n=4)
+j_tiles = sch.sample_perfect_tile(j, n=4)
+k_tiles = sch.sample_perfect_tile(k, n=2)
+# Tile the loops according to the random variables
+i_0, i_1, i_2, i_3 = sch.split(loop=i, factors=i_tiles)
+j_0, j_1, j_2, j_3 = sch.split(loop=j, factors=j_tiles)
+k_0, k_1           = sch.split(loop=k, factors=k_tiles)
+# Organize the loops into "SSRSRS" 6-level tiles
+sch.reorder(
+    i_0, j_0, # S: the 1st spatial tile
+    i_1, j_1, # S: the 2nd spatial tile
+    k_0,      # R: the 1st reduction tile
+    i_2, j_2, # S: the 3rd spatial tile
+    k_1,      # R: the 2nd reduction tile
+    i_3, j_3, # S: the 4th spatial tile
+)
+```
+
+### 3.4. AutoScheduler-style Design Space Generation
+
+AutoScheduler (Ansor) generates schedule templates by applying their SearchRules to each stage. SearchRule analyzes TE and eagerly trigger schedule primitives accordingly in its internally maintained mini IR.
+
+As introduced in Section 3.2, composite schedule rules are equivalent to AutoScheduler's SearchRule in TensorIR scheduling. To further generate a design space for scheduling, sampling instructions are used in composite schedule rules. Similarly, the sketch generation phase in AutoScheduler is equivalent to applying composite schedule rules to each block in TensorIR.
+
+Several built-in composite schedule rules are shipped with our system to align with Ansor's design space:
+
+* Multi-level tiling
+* Inline pure spatial blocks
+* Parallelize & vectorize & unroll
+* Auto tensorize
+* ...
+
+Developers may implement their own rules in either Python or C++, and specify which rules to use with the following syntax:
+
+```python
+from tvm import meta_schedule as ms
+
+design_space_generator = ms.PostOrderApply(rules=[
+    ms.MultiLevelTiling(...),
+    CustomRule(...),
+    ms.OtherBuiltinRules(...),
+])
+
+```
+
+### 3.5. Unifying manual schedule / AutoTVM / Ansor
+
+In this section, we show that the design space induced by TE manual schedule, AutoTVM and Ansor are all subsets of meta schedule, and meta schedule further allows mixing those three styles to search jointly.
+
+**Manual schedule**. The TE schedule is a special case of a meta schedule program, where there is no randomness introduced by sampling instructions. It is a single point in terms of design space.
+
+**AutoTVM (Template-based tuning)**. It is more natural representation of AutoTVM’s schedule templates (knobs) by writing one or more schedule functions in meta schedule with sampling instructions. The execution traces generated by the schedule functions are the design space to be explored.
+
+**AutoScheduler (Ansor, Template-free tuning)**. As mentioned in the previous section, application of composite schedule rules generates the design space, which is equivalent to Ansor’s sketch generation.
+
+**Mixing styles in design space definition**. By taking union of the spaces induced by the three special cases, our system allows developers to combine generic rules that Ansor provides and operator-specific scheduling.
+
+## 4. Reference-level explanation
+
+In this section, we introduce the underlying techniques for the automation system to extract and explore the design space. The figure below briefly illustrates the workflow of the system:
+
+![meta-schedule-workflow](../resources/meta-schedule-workflow.png)
+
+### 4.1. Execution trace as the design space
+
+**Trace**. To represent the design space defined by the meta schedule DSL, the underlying system records all the instructions users applied to the schedule class, including sampling and schedule primitives. We call this list of instructions a trace.
+
+Executing the example above results in the following trace:
+
+```
+Instruction 0. Sample-Perfect-Tile
+Instruction 1. Sample-Perfect-Tile
+Instruction 2. Sample-Perfect-Tile
+Instruction 3. Split
+Instruction 4. Split
+Instruction 5. Split
+Instruction 6. Reorder
+```
+
+**Trace forms design space.** A trace may contain zero or more sampling instructions, which introduces the uncertainty in scheduling - one instance of sampling results in one point in the design space. Therefore, the trace itself forms a design space to explore, e.g. which set of tile sizes works best on a specific hardware.
+
+**Union of design space**. Our system works on a set of traces, representing the union of the design spaces represented by every single trace.
+
+**Fork a trace**. When two different decisions in the scheduling process are equally important to generate high-performance schedules, we allow forking the trace into two, and the design space is the union of the forked traces.
+
+The trace is not strictly user-facing, but can be accessed and printed with the following syntax:
+
+```python
+# requires to trace the execution
+sch = tir.Schedule(..., traced=True)
+# do a lot of scheduling
+...
+# print the trace
+print(sch.trace)
+```
+
+And below is an example of the printed trace, which honestly reflects the schedule as a snippet of python scheduling function:
+
+```python
+b0 = sch.get_block(name="matmul", func_name="main")
+l1, l2, l3 = sch.get_loops(block=b0)
+v4, v5, v6, v7 = sch.sample_perfect_tile(loop=l1, n=4, max_innermost_factor=16, decision=[32, 1, 16, 2])
+v8, v9, v10, v11 = sch.sample_perfect_tile(loop=l2, n=4, max_innermost_factor=16, decision=[64, 4, 2, 2])
+v12, v13 = sch.sample_perfect_tile(loop=l3, n=2, max_innermost_factor=16, decision=[64, 16])
+l14, l15, l16, l17 = sch.split(loop=l1, factors=[v4, v5, v6, v7])
+l18, l19, l20, l21 = sch.split(loop=l2, factors=[v8, v9, v10, v11])
+l22, l23 = sch.split(loop=l3, factors=[v12, v13])
+sch.reorder(l14, l18, l15, l19, l22, l16, l20, l23, l17, l21)
+```
+
+### 4.2. Exploring the Search Space
+
+Meta Schedule provides several built-in exploration strategies to exhaustively or efficiently search for efficient schedules.
+
+**Program replay**. A simple strategy that replays the user-provided schedule function without using or taking any advantage of trace.

Review comment:
       I don't understand the purpose of this search strategy. Is this basically a nop?

##########
File path: rfcs/0001-meta-schedule-autotensorir.md
##########
@@ -0,0 +1,339 @@
+<!--- Licensed to the Apache Software Foundation (ASF) under one -->
+<!--- or more contributor license agreements.  See the NOTICE file -->
+<!--- distributed with this work for additional information -->
+<!--- regarding copyright ownership.  The ASF licenses this file -->
+<!--- to you under the Apache License, Version 2.0 (the -->
+<!--- "License"); you may not use this file except in compliance -->
+<!--- with the License.  You may obtain a copy of the License at -->
+
+<!---   http://www.apache.org/licenses/LICENSE-2.0 -->
+
+<!--- Unless required by applicable law or agreed to in writing, -->
+<!--- software distributed under the License is distributed on an -->
+<!--- "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY -->
+<!--- KIND, either express or implied.  See the License for the -->
+<!--- specific language governing permissions and limitations -->
+<!--- under the License. -->
+
+* Feature Name: Meta Schedule (AutoTIR)
+* Start Date: 2021-05-28
+* RFC PR: TBD (apache/tvm-rfcs#0000)
+* GitHub Issue: TBD (apache/tvm-rfcs#0000)
+
+## 1. Summary
+
+This proposal introduces Meta Schedule: a probabilistic scheduling DSL on TIR that unifies the approaches of AutoTVM and Auto Scheduler (Ansor). Meta schedule provides a pragmatic way to define the space of automatic tuning, extensibility in terms of all possible TIR schedule primitives like tensorization and loop partitioning, and customizability on every layer of the automation system.
+
+Meta Schedule is our 3rd generation automatic scheduling system.
+
+## 2. Motivation
+
+**Scheduling and Design Space**
+
+In TVM TensorIR, optimization of a TensorIR program is done via a sequence of transformations. For example, we reorder loops for better locality and we tensorize for specific hardware intrinsics. The process of invoking such a set of pre-defined transformations is called "**scheduling**", and each transformation is called a "**schedule primitive**". These primitives form a domain-specific language (DSL) describing the transformation of TensorIR programs. **Design space** is the set of all possible schedulings with respect to a TensorIR program.
+
+**Problems with the Current Scheduling System**
+
+* **Manual schedule**: Developers optimize their programs by manually invoking schedule primitives, i.e. explore points in the design space with humans in the loop. This can be a tedious and error-prone approach, hence the creation of AutoTVM and AutoScheduler (Ansor).
+* **AutoTVM**: The automation system requires users to define "schedule templates" as the design space for each operator. Therefore, it is inextensible to hundreds of operators and variety hardware platforms.
+* **AutoScheduler (Ansor)**: It automatically generates schedule templates as the design space, according to a set of predefined "search rules". However, it is non-trivial to extend AutoScheduler to new schedule primitives (tensorize, loop partition, software pipelining, etc).
+* The three systems above have isolated sets of APIs with several layers of their own abstraction, which are not only hard to learn, but also engineering-intensive to customize.
+
+**Benefit of Meta Schedule**
+
+* Succinct syntax, consistent APIs to TensorIR schedule with no other layer of abstraction.
+* Provides unified APIs for implementing manual schedule, AutoTVM and AutoScheduler (Ansor).
+* Extensibility to all the schedule primitives, including tensorization and loop partitioning. Almost no extra effort is needed to use a new primitive in auto-tuning.
+* The automation infrastructure is extensible on every of its components. Every component of the system can be customized easily in pure python or C++ or both. For example, one can develop a new design space generator in python, a new ProgramRunner in python, etc.
+
+
+## 3. Guide-level explanation
+
+In this section, we describe the syntax of meta schedule DSL, and how it could be used to describe and auto-generate the design space.
+
+### 3.1. Manual Schedule
+
+Meta schedule APIs are almost the same as TE or TensorIR scheduling. Here is an example of a manual schedule for matrix multiplication:
+
+```python
+# Designate a set of tile sizes
+i_tiles = [16, 8, 8, 8]
+j_tiles = [16, 8, 8, 8]
+k_tiles = [256, 8]
+
+# Tile the loops according to the tile sizes
+i_0, i_1, i_2, i_3 = sch.split(loop=i, factors=i_tiles)
+j_0, j_1, j_2, j_3 = sch.split(loop=j, factors=j_tiles)
+k_0, k_1           = sch.split(loop=k, factors=k_tiles)
+
+# Organize the loops into "SSRSRS" 6-level tiles
+sch.reorder(
+    i_0, j_0, # S: the 1st spatial tile
+    i_1, j_1, # S: the 2nd spatial tile
+    k_0,      # R: the 1st reduction tile
+    i_2, j_2, # S: the 3rd spatial tile
+    k_1,      # R: the 2nd reduction tile
+    i_3, j_3, # S: the 4th spatial tile
+)
+```
+
+In this example, the developers may tweak the tile sizes and measure the performance of the generated kernels to explore the opportunities of potential optimization.
+
+Generally speaking, while writing a schedule, there are often some parameters that are hard to determine ahead of time, for example, tile sizes, unroll steps, or which tensor intrinsics to use. Developers may manually enumerate possible combinations of these unknown factors, and then pick the best schedule according to measurement results on their device.
+
+### 3.2. Composite Schedule Rules
+
+As introduced in the previous section, in TensorIR, each schedule primitive handles only a very basic transformation of the IR, for example, `split` only splits a loop into two. In the real world, the over-fine granularity of those primitives usually leads to repetitive and verbose scheduling code, as [mentioned](https://discuss.tvm.apache.org/t/rfc-tensorir-a-schedulable-ir-for-tvm/7872/43?u=junrushao1994) by developers in our community.
+
+To make it more convenient and modular, we allow users to register "composite schedules" that apply a sequence of schedule primitives according to certain analysis of the IR. For instance, a composite schedule may inspect a TensorIR block and decide whether we should call `compute_inline` on it.
+
+### 3.3. AutoTVM-style Design Space Description
+
+Meta schedule extends the schedule DSL with sampling instructions. When included in a schedule, these instructions parametrize the schedule from a single deterministic point to a space supported by random variables (tile size, etc.), making it possible for developers to describe the design space with meta schedule APIs.
+
+We can extend the matmul example above to cover all possible tilings using these sampling instructions:
+
+```python
+# Sample tile sizes
+i_tiles = sch.sample_perfect_tile(i, n=4)
+j_tiles = sch.sample_perfect_tile(j, n=4)
+k_tiles = sch.sample_perfect_tile(k, n=2)
+# Tile the loops according to the random variables
+i_0, i_1, i_2, i_3 = sch.split(loop=i, factors=i_tiles)
+j_0, j_1, j_2, j_3 = sch.split(loop=j, factors=j_tiles)
+k_0, k_1           = sch.split(loop=k, factors=k_tiles)
+# Organize the loops into "SSRSRS" 6-level tiles
+sch.reorder(
+    i_0, j_0, # S: the 1st spatial tile
+    i_1, j_1, # S: the 2nd spatial tile
+    k_0,      # R: the 1st reduction tile
+    i_2, j_2, # S: the 3rd spatial tile
+    k_1,      # R: the 2nd reduction tile
+    i_3, j_3, # S: the 4th spatial tile
+)
+```
+
+### 3.4. AutoScheduler-style Design Space Generation
+
+AutoScheduler (Ansor) generates schedule templates by applying their SearchRules to each stage. SearchRule analyzes TE and eagerly trigger schedule primitives accordingly in its internally maintained mini IR.
+
+As introduced in Section 3.2, composite schedule rules are equivalent to AutoScheduler's SearchRule in TensorIR scheduling. To further generate a design space for scheduling, sampling instructions are used in composite schedule rules. Similarly, the sketch generation phase in AutoScheduler is equivalent to applying composite schedule rules to each block in TensorIR.
+
+Several built-in composite schedule rules are shipped with our system to align with Ansor's design space:
+
+* Multi-level tiling
+* Inline pure spatial blocks
+* Parallelize & vectorize & unroll
+* Auto tensorize
+* ...
+
+Developers may implement their own rules in either Python or C++, and specify which rules to use with the following syntax:
+
+```python
+from tvm import meta_schedule as ms
+
+design_space_generator = ms.PostOrderApply(rules=[
+    ms.MultiLevelTiling(...),
+    CustomRule(...),
+    ms.OtherBuiltinRules(...),
+])
+
+```
+
+### 3.5. Unifying manual schedule / AutoTVM / Ansor
+
+In this section, we show that the design space induced by TE manual schedule, AutoTVM and Ansor are all subsets of meta schedule, and meta schedule further allows mixing those three styles to search jointly.
+
+**Manual schedule**. The TE schedule is a special case of a meta schedule program, where there is no randomness introduced by sampling instructions. It is a single point in terms of design space.
+
+**AutoTVM (Template-based tuning)**. It is more natural representation of AutoTVM’s schedule templates (knobs) by writing one or more schedule functions in meta schedule with sampling instructions. The execution traces generated by the schedule functions are the design space to be explored.
+
+**AutoScheduler (Ansor, Template-free tuning)**. As mentioned in the previous section, application of composite schedule rules generates the design space, which is equivalent to Ansor’s sketch generation.
+
+**Mixing styles in design space definition**. By taking union of the spaces induced by the three special cases, our system allows developers to combine generic rules that Ansor provides and operator-specific scheduling.
+
+## 4. Reference-level explanation
+
+In this section, we introduce the underlying techniques for the automation system to extract and explore the design space. The figure below briefly illustrates the workflow of the system:
+
+![meta-schedule-workflow](../resources/meta-schedule-workflow.png)
+
+### 4.1. Execution trace as the design space
+
+**Trace**. To represent the design space defined by the meta schedule DSL, the underlying system records all the instructions users applied to the schedule class, including sampling and schedule primitives. We call this list of instructions a trace.
+
+Executing the example above results in the following trace:
+
+```
+Instruction 0. Sample-Perfect-Tile
+Instruction 1. Sample-Perfect-Tile
+Instruction 2. Sample-Perfect-Tile
+Instruction 3. Split
+Instruction 4. Split
+Instruction 5. Split
+Instruction 6. Reorder
+```
+
+**Trace forms design space.** A trace may contain zero or more sampling instructions, which introduces the uncertainty in scheduling - one instance of sampling results in one point in the design space. Therefore, the trace itself forms a design space to explore, e.g. which set of tile sizes works best on a specific hardware.
+
+**Union of design space**. Our system works on a set of traces, representing the union of the design spaces represented by every single trace.
+
+**Fork a trace**. When two different decisions in the scheduling process are equally important to generate high-performance schedules, we allow forking the trace into two, and the design space is the union of the forked traces.
+
+The trace is not strictly user-facing, but can be accessed and printed with the following syntax:
+
+```python
+# requires to trace the execution
+sch = tir.Schedule(..., traced=True)
+# do a lot of scheduling
+...
+# print the trace
+print(sch.trace)
+```
+
+And below is an example of the printed trace, which honestly reflects the schedule as a snippet of python scheduling function:
+
+```python
+b0 = sch.get_block(name="matmul", func_name="main")
+l1, l2, l3 = sch.get_loops(block=b0)
+v4, v5, v6, v7 = sch.sample_perfect_tile(loop=l1, n=4, max_innermost_factor=16, decision=[32, 1, 16, 2])
+v8, v9, v10, v11 = sch.sample_perfect_tile(loop=l2, n=4, max_innermost_factor=16, decision=[64, 4, 2, 2])
+v12, v13 = sch.sample_perfect_tile(loop=l3, n=2, max_innermost_factor=16, decision=[64, 16])
+l14, l15, l16, l17 = sch.split(loop=l1, factors=[v4, v5, v6, v7])
+l18, l19, l20, l21 = sch.split(loop=l2, factors=[v8, v9, v10, v11])
+l22, l23 = sch.split(loop=l3, factors=[v12, v13])
+sch.reorder(l14, l18, l15, l19, l22, l16, l20, l23, l17, l21)
+```
+
+### 4.2. Exploring the Search Space
+
+Meta Schedule provides several built-in exploration strategies to exhaustively or efficiently search for efficient schedules.
+
+**Program replay**. A simple strategy that replays the user-provided schedule function without using or taking any advantage of trace.
+
+**Random search**. Extracts the traces as the design space from any design space generator (e.g. user-provided schedule function, composite schedule rules applied to each block, or any custom space generator), repetitively mutates the random decisions of a random trace and re-executes the traces.
+
+**Cost-model-guided evolutionary search**. A more efficient exploration strategy. We define two sets of rules:
+
+* Mutator: defines how to jump to a point’s "neighbor" in the design space
+* Postprocessor: after the trace is executed, there are some extra rules we want to execute, for example:
+  * Check CUDA resource limits: There is a hard requirement in CUDA that the maximum number of threads should not exceed 1024, but it is a random variable that cannot be determined before actually executing the trace. In this case, we write a postprocessor that errors out when the condition is not satisfied.
+  * Fuse outer loops until the extent of the fused loops is large enough: The number of outer loops to be fused together depends on their extents, which are random variables. In this case, we annotate the maximum extent allowed on the block, and do actual fusion in a postprocessor.
+
+Our evolutionary search algorithm uses mutators to find possible schedules in the design space, then applies postprocessors, and asks the cost model to predict its performance. After several iterations, the new schedules with the highest scores are finally compiled and measured on device. Epsilon-greedy is used in this process to balance exploitation and exploration.
+
+### 4.3. Python first for flexibility & customizability
+
+We implement the system in a way that all levels are decoupled and open to customization, aiming at providing a playground for developers to try out new ideas and potentially deliver performance quickly.
+
+While all the important APIs are implemented in C++ for efficiency, every part of the system can be easily switched to customized python implementation. For example,
+
+**Customize design space in python**. Can be a python function that does the schedule
+
+```python
+def schedule_matmul(sch) -> sch:
+    i, j, k = sch.get_loops(sch.get_block("matmul"))
+    i_tiles = sch.sample_perfect_tile(i, n=4)
+    j_tiles = sch.sample_perfect_tile(j, n=4)
+    k_tiles = sch.sample_perfect_tile(k, n=2)
+    # Tile the loops according to the random variables
+    i_0, i_1, i_2, i_3 = sch.split(loop=i, factors=i_tiles)
+    j_0, j_1, j_2, j_3 = sch.split(loop=j, factors=j_tiles)
+    k_0, k_1 = sch.split(loop=k, factors=k_tiles)
+    # Organize the loops into "SSRSRS" 6-level tiles
+    sch.reorder(
+        i_0, j_0, # S
+        i_1, j_1, # S
+        k_0,      # R
+        i_2, j_2, # S
+        k_1,      # R
+        i_3, j_3, # S
+    )
+    return sch
+```
+
+**Customize composite schedule in python**. We provide two ways to define a composite schedule in python:
+
+Method 1. Derive from `PyCompositeSchedule`, and implement two methods `initialize` and `apply`:
+
+```python
+class MultiLevelTiling(PyCompositeSchedule):
+    def initialize(...):
+        # initialize the class, usually this method is empty
+        ...
+
+    def apply(sch: Schedule, block: BlockRV) -> Union[Schedule, List[Schedule]]:
+        # write any python code, including:
+        # - analyze `block`
+        # - invoke schedule primitives in `sch`
+        # - do debug printing
+        ...
+```
+
+Method 2. A decorator as the syntactic sugar if the `initialize` method is empty, which converts the function to the `apply` method.
+
+```python
+@tir.as_composite_schedule(name="multi-level-tiling")
+def multi_level_tiling(sch: Schedule, block: BlockRV) -> Union[Schedule, List[Schedule]]:
+    # write any python code, including:
+    # - analyze `block`
+    # - invoke schedule primitives in `sch`
+    # - do debug printing
+    ...
+```
+
+**Customize exploration strategies in python**. Developers can implement any search algorithm in python as well by deriving from `PySearchPolicy`, and the syntax is identical to customizing with `PyCompositeSchedule`.
+
+**Other customizable components**. This list includes:
+
+* Cost model
+* Database
+* Measure callbacks
+* Feature extractor
+* Program builder & runner
+* Analysis methods
+* ...
+
+In a short summary, almost every component of the system is decoupled with each other and extensions could be easily plugged in.
+
+## 5. Drawbacks
+
+We are not aware of any drawbacks of the proposed system.

Review comment:
       Are there really no drawbacks? For example, do we have to rewrite every schedule?

##########
File path: rfcs/0001-meta-schedule-autotensorir.md
##########
@@ -0,0 +1,339 @@
+<!--- Licensed to the Apache Software Foundation (ASF) under one -->
+<!--- or more contributor license agreements.  See the NOTICE file -->
+<!--- distributed with this work for additional information -->
+<!--- regarding copyright ownership.  The ASF licenses this file -->
+<!--- to you under the Apache License, Version 2.0 (the -->
+<!--- "License"); you may not use this file except in compliance -->
+<!--- with the License.  You may obtain a copy of the License at -->
+
+<!---   http://www.apache.org/licenses/LICENSE-2.0 -->
+
+<!--- Unless required by applicable law or agreed to in writing, -->
+<!--- software distributed under the License is distributed on an -->
+<!--- "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY -->
+<!--- KIND, either express or implied.  See the License for the -->
+<!--- specific language governing permissions and limitations -->
+<!--- under the License. -->
+
+* Feature Name: Meta Schedule (AutoTIR)

Review comment:
       ```suggestion
   * Feature Name: Meta Schedule (Formerly AutoTIR)
   ```
   
   I think we should be really clear that Meta Schedule is the only name for this now.

##########
File path: rfcs/0001-meta-schedule-autotensorir.md
##########
@@ -0,0 +1,339 @@
+<!--- Licensed to the Apache Software Foundation (ASF) under one -->
+<!--- or more contributor license agreements.  See the NOTICE file -->
+<!--- distributed with this work for additional information -->
+<!--- regarding copyright ownership.  The ASF licenses this file -->
+<!--- to you under the Apache License, Version 2.0 (the -->
+<!--- "License"); you may not use this file except in compliance -->
+<!--- with the License.  You may obtain a copy of the License at -->
+
+<!---   http://www.apache.org/licenses/LICENSE-2.0 -->
+
+<!--- Unless required by applicable law or agreed to in writing, -->
+<!--- software distributed under the License is distributed on an -->
+<!--- "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY -->
+<!--- KIND, either express or implied.  See the License for the -->
+<!--- specific language governing permissions and limitations -->
+<!--- under the License. -->
+
+* Feature Name: Meta Schedule (AutoTIR)
+* Start Date: 2021-05-28
+* RFC PR: TBD (apache/tvm-rfcs#0000)
+* GitHub Issue: TBD (apache/tvm-rfcs#0000)
+
+## 1. Summary
+
+This proposal introduces Meta Schedule: a probabilistic scheduling DSL on TIR that unifies the approaches of AutoTVM and Auto Scheduler (Ansor). Meta schedule provides a pragmatic way to define the space of automatic tuning, extensibility in terms of all possible TIR schedule primitives like tensorization and loop partitioning, and customizability on every layer of the automation system.
+
+Meta Schedule is our 3rd generation automatic scheduling system.
+
+## 2. Motivation
+
+**Scheduling and Design Space**

Review comment:
       Maybe make these H3s? (Use `###`)

##########
File path: rfcs/0001-meta-schedule-autotensorir.md
##########
@@ -0,0 +1,339 @@
+<!--- Licensed to the Apache Software Foundation (ASF) under one -->
+<!--- or more contributor license agreements.  See the NOTICE file -->
+<!--- distributed with this work for additional information -->
+<!--- regarding copyright ownership.  The ASF licenses this file -->
+<!--- to you under the Apache License, Version 2.0 (the -->
+<!--- "License"); you may not use this file except in compliance -->
+<!--- with the License.  You may obtain a copy of the License at -->
+
+<!---   http://www.apache.org/licenses/LICENSE-2.0 -->
+
+<!--- Unless required by applicable law or agreed to in writing, -->
+<!--- software distributed under the License is distributed on an -->
+<!--- "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY -->
+<!--- KIND, either express or implied.  See the License for the -->
+<!--- specific language governing permissions and limitations -->
+<!--- under the License. -->
+
+* Feature Name: Meta Schedule (AutoTIR)
+* Start Date: 2021-05-28
+* RFC PR: TBD (apache/tvm-rfcs#0000)
+* GitHub Issue: TBD (apache/tvm-rfcs#0000)
+
+## 1. Summary
+
+This proposal introduces Meta Schedule: a probabilistic scheduling DSL on TIR that unifies the approaches of AutoTVM and Auto Scheduler (Ansor). Meta schedule provides a pragmatic way to define the space of automatic tuning, extensibility in terms of all possible TIR schedule primitives like tensorization and loop partitioning, and customizability on every layer of the automation system.
+
+Meta Schedule is our 3rd generation automatic scheduling system.
+
+## 2. Motivation
+
+**Scheduling and Design Space**
+
+In TVM TensorIR, optimization of a TensorIR program is done via a sequence of transformations. For example, we reorder loops for better locality and we tensorize for specific hardware intrinsics. The process of invoking such a set of pre-defined transformations is called "**scheduling**", and each transformation is called a "**schedule primitive**". These primitives form a domain-specific language (DSL) describing the transformation of TensorIR programs. **Design space** is the set of all possible schedulings with respect to a TensorIR program.
+
+**Problems with the Current Scheduling System**
+
+* **Manual schedule**: Developers optimize their programs by manually invoking schedule primitives, i.e. explore points in the design space with humans in the loop. This can be a tedious and error-prone approach, hence the creation of AutoTVM and AutoScheduler (Ansor).
+* **AutoTVM**: The automation system requires users to define "schedule templates" as the design space for each operator. Therefore, it is inextensible to hundreds of operators and variety hardware platforms.
+* **AutoScheduler (Ansor)**: It automatically generates schedule templates as the design space, according to a set of predefined "search rules". However, it is non-trivial to extend AutoScheduler to new schedule primitives (tensorize, loop partition, software pipelining, etc).
+* The three systems above have isolated sets of APIs with several layers of their own abstraction, which are not only hard to learn, but also engineering-intensive to customize.
+
+**Benefit of Meta Schedule**
+
+* Succinct syntax, consistent APIs to TensorIR schedule with no other layer of abstraction.
+* Provides unified APIs for implementing manual schedule, AutoTVM and AutoScheduler (Ansor).
+* Extensibility to all the schedule primitives, including tensorization and loop partitioning. Almost no extra effort is needed to use a new primitive in auto-tuning.
+* The automation infrastructure is extensible on every of its components. Every component of the system can be customized easily in pure python or C++ or both. For example, one can develop a new design space generator in python, a new ProgramRunner in python, etc.

Review comment:
       ```suggestion
   * The automation infrastructure is extensible in every one of its components. Every component of the system can be customized easily in pure python or C++ or both. For example, one can develop a new design space generator in python, a new ProgramRunner in python, etc.
   ```

##########
File path: rfcs/0001-meta-schedule-autotensorir.md
##########
@@ -0,0 +1,339 @@
+<!--- Licensed to the Apache Software Foundation (ASF) under one -->
+<!--- or more contributor license agreements.  See the NOTICE file -->
+<!--- distributed with this work for additional information -->
+<!--- regarding copyright ownership.  The ASF licenses this file -->
+<!--- to you under the Apache License, Version 2.0 (the -->
+<!--- "License"); you may not use this file except in compliance -->
+<!--- with the License.  You may obtain a copy of the License at -->
+
+<!---   http://www.apache.org/licenses/LICENSE-2.0 -->
+
+<!--- Unless required by applicable law or agreed to in writing, -->
+<!--- software distributed under the License is distributed on an -->
+<!--- "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY -->
+<!--- KIND, either express or implied.  See the License for the -->
+<!--- specific language governing permissions and limitations -->
+<!--- under the License. -->
+
+* Feature Name: Meta Schedule (AutoTIR)
+* Start Date: 2021-05-28
+* RFC PR: TBD (apache/tvm-rfcs#0000)
+* GitHub Issue: TBD (apache/tvm-rfcs#0000)
+
+## 1. Summary
+
+This proposal introduces Meta Schedule: a probabilistic scheduling DSL on TIR that unifies the approaches of AutoTVM and Auto Scheduler (Ansor). Meta schedule provides a pragmatic way to define the space of automatic tuning, extensibility in terms of all possible TIR schedule primitives like tensorization and loop partitioning, and customizability on every layer of the automation system.
+
+Meta Schedule is our 3rd generation automatic scheduling system.
+
+## 2. Motivation
+
+**Scheduling and Design Space**
+
+In TVM TensorIR, optimization of a TensorIR program is done via a sequence of transformations. For example, we reorder loops for better locality and we tensorize for specific hardware intrinsics. The process of invoking such a set of pre-defined transformations is called "**scheduling**", and each transformation is called a "**schedule primitive**". These primitives form a domain-specific language (DSL) describing the transformation of TensorIR programs. **Design space** is the set of all possible schedulings with respect to a TensorIR program.
+
+**Problems with the Current Scheduling System**
+
+* **Manual schedule**: Developers optimize their programs by manually invoking schedule primitives, i.e. explore points in the design space with humans in the loop. This can be a tedious and error-prone approach, hence the creation of AutoTVM and AutoScheduler (Ansor).
+* **AutoTVM**: The automation system requires users to define "schedule templates" as the design space for each operator. Therefore, it is inextensible to hundreds of operators and variety hardware platforms.
+* **AutoScheduler (Ansor)**: It automatically generates schedule templates as the design space, according to a set of predefined "search rules". However, it is non-trivial to extend AutoScheduler to new schedule primitives (tensorize, loop partition, software pipelining, etc).
+* The three systems above have isolated sets of APIs with several layers of their own abstraction, which are not only hard to learn, but also engineering-intensive to customize.
+
+**Benefit of Meta Schedule**
+
+* Succinct syntax, consistent APIs to TensorIR schedule with no other layer of abstraction.
+* Provides unified APIs for implementing manual schedule, AutoTVM and AutoScheduler (Ansor).
+* Extensibility to all the schedule primitives, including tensorization and loop partitioning. Almost no extra effort is needed to use a new primitive in auto-tuning.
+* The automation infrastructure is extensible on every of its components. Every component of the system can be customized easily in pure python or C++ or both. For example, one can develop a new design space generator in python, a new ProgramRunner in python, etc.
+
+
+## 3. Guide-level explanation
+
+In this section, we describe the syntax of meta schedule DSL, and how it could be used to describe and auto-generate the design space.
+
+### 3.1. Manual Schedule
+
+Meta schedule APIs are almost the same as TE or TensorIR scheduling. Here is an example of a manual schedule for matrix multiplication:
+
+```python
+# Designate a set of tile sizes
+i_tiles = [16, 8, 8, 8]
+j_tiles = [16, 8, 8, 8]
+k_tiles = [256, 8]
+
+# Tile the loops according to the tile sizes
+i_0, i_1, i_2, i_3 = sch.split(loop=i, factors=i_tiles)
+j_0, j_1, j_2, j_3 = sch.split(loop=j, factors=j_tiles)
+k_0, k_1           = sch.split(loop=k, factors=k_tiles)
+
+# Organize the loops into "SSRSRS" 6-level tiles
+sch.reorder(
+    i_0, j_0, # S: the 1st spatial tile
+    i_1, j_1, # S: the 2nd spatial tile
+    k_0,      # R: the 1st reduction tile
+    i_2, j_2, # S: the 3rd spatial tile
+    k_1,      # R: the 2nd reduction tile
+    i_3, j_3, # S: the 4th spatial tile
+)
+```
+
+In this example, the developers may tweak the tile sizes and measure the performance of the generated kernels to explore the opportunities of potential optimization.
+
+Generally speaking, while writing a schedule, there are often some parameters that are hard to determine ahead of time, for example, tile sizes, unroll steps, or which tensor intrinsics to use. Developers may manually enumerate possible combinations of these unknown factors, and then pick the best schedule according to measurement results on their device.
+
+### 3.2. Composite Schedule Rules
+
+As introduced in the previous section, in TensorIR, each schedule primitive handles only a very basic transformation of the IR, for example, `split` only splits a loop into two. In the real world, the over-fine granularity of those primitives usually leads to repetitive and verbose scheduling code, as [mentioned](https://discuss.tvm.apache.org/t/rfc-tensorir-a-schedulable-ir-for-tvm/7872/43?u=junrushao1994) by developers in our community.

Review comment:
       ```suggestion
   As introduced in the previous section, in TensorIR, each schedule primitive handles only a very basic transformation of the IR. For example, `split` only splits a loop into two new loops. In the real world, the over-fine granularity of those primitives usually leads to repetitive and verbose scheduling code, as [mentioned](https://discuss.tvm.apache.org/t/rfc-tensorir-a-schedulable-ir-for-tvm/7872/43?u=junrushao1994) by developers in our community.
   ```

##########
File path: rfcs/0001-meta-schedule-autotensorir.md
##########
@@ -0,0 +1,339 @@
+<!--- Licensed to the Apache Software Foundation (ASF) under one -->
+<!--- or more contributor license agreements.  See the NOTICE file -->
+<!--- distributed with this work for additional information -->
+<!--- regarding copyright ownership.  The ASF licenses this file -->
+<!--- to you under the Apache License, Version 2.0 (the -->
+<!--- "License"); you may not use this file except in compliance -->
+<!--- with the License.  You may obtain a copy of the License at -->
+
+<!---   http://www.apache.org/licenses/LICENSE-2.0 -->
+
+<!--- Unless required by applicable law or agreed to in writing, -->
+<!--- software distributed under the License is distributed on an -->
+<!--- "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY -->
+<!--- KIND, either express or implied.  See the License for the -->
+<!--- specific language governing permissions and limitations -->
+<!--- under the License. -->
+
+* Feature Name: Meta Schedule (AutoTIR)
+* Start Date: 2021-05-28
+* RFC PR: TBD (apache/tvm-rfcs#0000)
+* GitHub Issue: TBD (apache/tvm-rfcs#0000)
+
+## 1. Summary
+
+This proposal introduces Meta Schedule: a probabilistic scheduling DSL on TIR that unifies the approaches of AutoTVM and Auto Scheduler (Ansor). Meta schedule provides a pragmatic way to define the space of automatic tuning, extensibility in terms of all possible TIR schedule primitives like tensorization and loop partitioning, and customizability on every layer of the automation system.
+
+Meta Schedule is our 3rd generation automatic scheduling system.
+
+## 2. Motivation
+
+**Scheduling and Design Space**
+
+In TVM TensorIR, optimization of a TensorIR program is done via a sequence of transformations. For example, we reorder loops for better locality and we tensorize for specific hardware intrinsics. The process of invoking such a set of pre-defined transformations is called "**scheduling**", and each transformation is called a "**schedule primitive**". These primitives form a domain-specific language (DSL) describing the transformation of TensorIR programs. **Design space** is the set of all possible schedulings with respect to a TensorIR program.

Review comment:
       ```suggestion
   In TVM TensorIR, optimization of a TensorIR program is done via a sequence of transformations. For example, we reorder loops for better locality and we tensorize for specific hardware intrinsics. The process of invoking such a set of pre-defined transformations is called "**scheduling**", and each transformation is called a "**schedule primitive**". These primitives form a domain-specific language (DSL) describing the transformation of TensorIR programs. "**Design space**" is the set of all possible schedulings with respect to a TensorIR program.
   ```

##########
File path: rfcs/0001-meta-schedule-autotensorir.md
##########
@@ -0,0 +1,339 @@
+<!--- Licensed to the Apache Software Foundation (ASF) under one -->
+<!--- or more contributor license agreements.  See the NOTICE file -->
+<!--- distributed with this work for additional information -->
+<!--- regarding copyright ownership.  The ASF licenses this file -->
+<!--- to you under the Apache License, Version 2.0 (the -->
+<!--- "License"); you may not use this file except in compliance -->
+<!--- with the License.  You may obtain a copy of the License at -->
+
+<!---   http://www.apache.org/licenses/LICENSE-2.0 -->
+
+<!--- Unless required by applicable law or agreed to in writing, -->
+<!--- software distributed under the License is distributed on an -->
+<!--- "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY -->
+<!--- KIND, either express or implied.  See the License for the -->
+<!--- specific language governing permissions and limitations -->
+<!--- under the License. -->
+
+* Feature Name: Meta Schedule (AutoTIR)
+* Start Date: 2021-05-28
+* RFC PR: TBD (apache/tvm-rfcs#0000)
+* GitHub Issue: TBD (apache/tvm-rfcs#0000)
+
+## 1. Summary
+
+This proposal introduces Meta Schedule: a probabilistic scheduling DSL on TIR that unifies the approaches of AutoTVM and Auto Scheduler (Ansor). Meta schedule provides a pragmatic way to define the space of automatic tuning, extensibility in terms of all possible TIR schedule primitives like tensorization and loop partitioning, and customizability on every layer of the automation system.
+
+Meta Schedule is our 3rd generation automatic scheduling system.
+
+## 2. Motivation
+
+**Scheduling and Design Space**
+
+In TVM TensorIR, optimization of a TensorIR program is done via a sequence of transformations. For example, we reorder loops for better locality and we tensorize for specific hardware intrinsics. The process of invoking such a set of pre-defined transformations is called "**scheduling**", and each transformation is called a "**schedule primitive**". These primitives form a domain-specific language (DSL) describing the transformation of TensorIR programs. **Design space** is the set of all possible schedulings with respect to a TensorIR program.
+
+**Problems with the Current Scheduling System**
+
+* **Manual schedule**: Developers optimize their programs by manually invoking schedule primitives, i.e. explore points in the design space with humans in the loop. This can be a tedious and error-prone approach, hence the creation of AutoTVM and AutoScheduler (Ansor).
+* **AutoTVM**: The automation system requires users to define "schedule templates" as the design space for each operator. Therefore, it is inextensible to hundreds of operators and variety hardware platforms.
+* **AutoScheduler (Ansor)**: It automatically generates schedule templates as the design space, according to a set of predefined "search rules". However, it is non-trivial to extend AutoScheduler to new schedule primitives (tensorize, loop partition, software pipelining, etc).
+* The three systems above have isolated sets of APIs with several layers of their own abstraction, which are not only hard to learn, but also engineering-intensive to customize.
+
+**Benefit of Meta Schedule**
+
+* Succinct syntax, consistent APIs to TensorIR schedule with no other layer of abstraction.
+* Provides unified APIs for implementing manual schedule, AutoTVM and AutoScheduler (Ansor).
+* Extensibility to all the schedule primitives, including tensorization and loop partitioning. Almost no extra effort is needed to use a new primitive in auto-tuning.
+* The automation infrastructure is extensible on every of its components. Every component of the system can be customized easily in pure python or C++ or both. For example, one can develop a new design space generator in python, a new ProgramRunner in python, etc.
+
+
+## 3. Guide-level explanation
+
+In this section, we describe the syntax of meta schedule DSL, and how it could be used to describe and auto-generate the design space.
+
+### 3.1. Manual Schedule
+
+Meta schedule APIs are almost the same as TE or TensorIR scheduling. Here is an example of a manual schedule for matrix multiplication:
+
+```python
+# Designate a set of tile sizes
+i_tiles = [16, 8, 8, 8]
+j_tiles = [16, 8, 8, 8]
+k_tiles = [256, 8]
+
+# Tile the loops according to the tile sizes
+i_0, i_1, i_2, i_3 = sch.split(loop=i, factors=i_tiles)
+j_0, j_1, j_2, j_3 = sch.split(loop=j, factors=j_tiles)
+k_0, k_1           = sch.split(loop=k, factors=k_tiles)
+
+# Organize the loops into "SSRSRS" 6-level tiles
+sch.reorder(
+    i_0, j_0, # S: the 1st spatial tile
+    i_1, j_1, # S: the 2nd spatial tile
+    k_0,      # R: the 1st reduction tile
+    i_2, j_2, # S: the 3rd spatial tile
+    k_1,      # R: the 2nd reduction tile
+    i_3, j_3, # S: the 4th spatial tile
+)
+```
+
+In this example, the developers may tweak the tile sizes and measure the performance of the generated kernels to explore the opportunities of potential optimization.
+
+Generally speaking, while writing a schedule, there are often some parameters that are hard to determine ahead of time, for example, tile sizes, unroll steps, or which tensor intrinsics to use. Developers may manually enumerate possible combinations of these unknown factors, and then pick the best schedule according to measurement results on their device.
+
+### 3.2. Composite Schedule Rules
+
+As introduced in the previous section, in TensorIR, each schedule primitive handles only a very basic transformation of the IR, for example, `split` only splits a loop into two. In the real world, the over-fine granularity of those primitives usually leads to repetitive and verbose scheduling code, as [mentioned](https://discuss.tvm.apache.org/t/rfc-tensorir-a-schedulable-ir-for-tvm/7872/43?u=junrushao1994) by developers in our community.
+
+To make it more convenient and modular, we allow users to register "composite schedules" that apply a sequence of schedule primitives according to certain analysis of the IR. For instance, a composite schedule may inspect a TensorIR block and decide whether we should call `compute_inline` on it.
+
+### 3.3. AutoTVM-style Design Space Description
+
+Meta schedule extends the schedule DSL with sampling instructions. When included in a schedule, these instructions parametrize the schedule from a single deterministic point to a space supported by random variables (tile size, etc.), making it possible for developers to describe the design space with meta schedule APIs.
+
+We can extend the matmul example above to cover all possible tilings using these sampling instructions:
+
+```python
+# Sample tile sizes
+i_tiles = sch.sample_perfect_tile(i, n=4)
+j_tiles = sch.sample_perfect_tile(j, n=4)
+k_tiles = sch.sample_perfect_tile(k, n=2)
+# Tile the loops according to the random variables
+i_0, i_1, i_2, i_3 = sch.split(loop=i, factors=i_tiles)
+j_0, j_1, j_2, j_3 = sch.split(loop=j, factors=j_tiles)
+k_0, k_1           = sch.split(loop=k, factors=k_tiles)
+# Organize the loops into "SSRSRS" 6-level tiles
+sch.reorder(
+    i_0, j_0, # S: the 1st spatial tile
+    i_1, j_1, # S: the 2nd spatial tile
+    k_0,      # R: the 1st reduction tile
+    i_2, j_2, # S: the 3rd spatial tile
+    k_1,      # R: the 2nd reduction tile
+    i_3, j_3, # S: the 4th spatial tile
+)
+```
+
+### 3.4. AutoScheduler-style Design Space Generation
+
+AutoScheduler (Ansor) generates schedule templates by applying their SearchRules to each stage. SearchRule analyzes TE and eagerly trigger schedule primitives accordingly in its internally maintained mini IR.
+
+As introduced in Section 3.2, composite schedule rules are equivalent to AutoScheduler's SearchRule in TensorIR scheduling. To further generate a design space for scheduling, sampling instructions are used in composite schedule rules. Similarly, the sketch generation phase in AutoScheduler is equivalent to applying composite schedule rules to each block in TensorIR.
+
+Several built-in composite schedule rules are shipped with our system to align with Ansor's design space:
+
+* Multi-level tiling
+* Inline pure spatial blocks
+* Parallelize & vectorize & unroll
+* Auto tensorize
+* ...

Review comment:
       Delete this line

##########
File path: rfcs/0001-meta-schedule-autotensorir.md
##########
@@ -0,0 +1,339 @@
+<!--- Licensed to the Apache Software Foundation (ASF) under one -->
+<!--- or more contributor license agreements.  See the NOTICE file -->
+<!--- distributed with this work for additional information -->
+<!--- regarding copyright ownership.  The ASF licenses this file -->
+<!--- to you under the Apache License, Version 2.0 (the -->
+<!--- "License"); you may not use this file except in compliance -->
+<!--- with the License.  You may obtain a copy of the License at -->
+
+<!---   http://www.apache.org/licenses/LICENSE-2.0 -->
+
+<!--- Unless required by applicable law or agreed to in writing, -->
+<!--- software distributed under the License is distributed on an -->
+<!--- "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY -->
+<!--- KIND, either express or implied.  See the License for the -->
+<!--- specific language governing permissions and limitations -->
+<!--- under the License. -->
+
+* Feature Name: Meta Schedule (AutoTIR)
+* Start Date: 2021-05-28
+* RFC PR: TBD (apache/tvm-rfcs#0000)
+* GitHub Issue: TBD (apache/tvm-rfcs#0000)
+
+## 1. Summary
+
+This proposal introduces Meta Schedule: a probabilistic scheduling DSL on TIR that unifies the approaches of AutoTVM and Auto Scheduler (Ansor). Meta schedule provides a pragmatic way to define the space of automatic tuning, extensibility in terms of all possible TIR schedule primitives like tensorization and loop partitioning, and customizability on every layer of the automation system.
+
+Meta Schedule is our 3rd generation automatic scheduling system.
+
+## 2. Motivation
+
+**Scheduling and Design Space**
+
+In TVM TensorIR, optimization of a TensorIR program is done via a sequence of transformations. For example, we reorder loops for better locality and we tensorize for specific hardware intrinsics. The process of invoking such a set of pre-defined transformations is called "**scheduling**", and each transformation is called a "**schedule primitive**". These primitives form a domain-specific language (DSL) describing the transformation of TensorIR programs. **Design space** is the set of all possible schedulings with respect to a TensorIR program.
+
+**Problems with the Current Scheduling System**
+
+* **Manual schedule**: Developers optimize their programs by manually invoking schedule primitives, i.e. explore points in the design space with humans in the loop. This can be a tedious and error-prone approach, hence the creation of AutoTVM and AutoScheduler (Ansor).
+* **AutoTVM**: The automation system requires users to define "schedule templates" as the design space for each operator. Therefore, it is inextensible to hundreds of operators and variety hardware platforms.
+* **AutoScheduler (Ansor)**: It automatically generates schedule templates as the design space, according to a set of predefined "search rules". However, it is non-trivial to extend AutoScheduler to new schedule primitives (tensorize, loop partition, software pipelining, etc).
+* The three systems above have isolated sets of APIs with several layers of their own abstraction, which are not only hard to learn, but also engineering-intensive to customize.
+
+**Benefit of Meta Schedule**
+
+* Succinct syntax, consistent APIs to TensorIR schedule with no other layer of abstraction.
+* Provides unified APIs for implementing manual schedule, AutoTVM and AutoScheduler (Ansor).

Review comment:
       ```suggestion
   * Provides unified APIs for implementing manual schedules, AutoTVM-style schedules, and AutoScheduler-style schedules.
   ```

##########
File path: rfcs/0001-meta-schedule-autotensorir.md
##########
@@ -0,0 +1,339 @@
+<!--- Licensed to the Apache Software Foundation (ASF) under one -->
+<!--- or more contributor license agreements.  See the NOTICE file -->
+<!--- distributed with this work for additional information -->
+<!--- regarding copyright ownership.  The ASF licenses this file -->
+<!--- to you under the Apache License, Version 2.0 (the -->
+<!--- "License"); you may not use this file except in compliance -->
+<!--- with the License.  You may obtain a copy of the License at -->
+
+<!---   http://www.apache.org/licenses/LICENSE-2.0 -->
+
+<!--- Unless required by applicable law or agreed to in writing, -->
+<!--- software distributed under the License is distributed on an -->
+<!--- "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY -->
+<!--- KIND, either express or implied.  See the License for the -->
+<!--- specific language governing permissions and limitations -->
+<!--- under the License. -->
+
+* Feature Name: Meta Schedule (AutoTIR)
+* Start Date: 2021-05-28
+* RFC PR: TBD (apache/tvm-rfcs#0000)
+* GitHub Issue: TBD (apache/tvm-rfcs#0000)
+
+## 1. Summary
+
+This proposal introduces Meta Schedule: a probabilistic scheduling DSL on TIR that unifies the approaches of AutoTVM and Auto Scheduler (Ansor). Meta schedule provides a pragmatic way to define the space of automatic tuning, extensibility in terms of all possible TIR schedule primitives like tensorization and loop partitioning, and customizability on every layer of the automation system.
+
+Meta Schedule is our 3rd generation automatic scheduling system.
+
+## 2. Motivation
+
+**Scheduling and Design Space**
+
+In TVM TensorIR, optimization of a TensorIR program is done via a sequence of transformations. For example, we reorder loops for better locality and we tensorize for specific hardware intrinsics. The process of invoking such a set of pre-defined transformations is called "**scheduling**", and each transformation is called a "**schedule primitive**". These primitives form a domain-specific language (DSL) describing the transformation of TensorIR programs. **Design space** is the set of all possible schedulings with respect to a TensorIR program.
+
+**Problems with the Current Scheduling System**
+
+* **Manual schedule**: Developers optimize their programs by manually invoking schedule primitives, i.e. explore points in the design space with humans in the loop. This can be a tedious and error-prone approach, hence the creation of AutoTVM and AutoScheduler (Ansor).
+* **AutoTVM**: The automation system requires users to define "schedule templates" as the design space for each operator. Therefore, it is inextensible to hundreds of operators and variety hardware platforms.
+* **AutoScheduler (Ansor)**: It automatically generates schedule templates as the design space, according to a set of predefined "search rules". However, it is non-trivial to extend AutoScheduler to new schedule primitives (tensorize, loop partition, software pipelining, etc).
+* The three systems above have isolated sets of APIs with several layers of their own abstraction, which are not only hard to learn, but also engineering-intensive to customize.
+
+**Benefit of Meta Schedule**
+
+* Succinct syntax, consistent APIs to TensorIR schedule with no other layer of abstraction.
+* Provides unified APIs for implementing manual schedule, AutoTVM and AutoScheduler (Ansor).
+* Extensibility to all the schedule primitives, including tensorization and loop partitioning. Almost no extra effort is needed to use a new primitive in auto-tuning.

Review comment:
       ```suggestion
   * Extensibility of all the schedule primitives, including tensorization and loop partitioning. Almost no extra effort is needed to use a new primitive in auto-tuning.
   ```

##########
File path: rfcs/0001-meta-schedule-autotensorir.md
##########
@@ -0,0 +1,339 @@
+<!--- Licensed to the Apache Software Foundation (ASF) under one -->
+<!--- or more contributor license agreements.  See the NOTICE file -->
+<!--- distributed with this work for additional information -->
+<!--- regarding copyright ownership.  The ASF licenses this file -->
+<!--- to you under the Apache License, Version 2.0 (the -->
+<!--- "License"); you may not use this file except in compliance -->
+<!--- with the License.  You may obtain a copy of the License at -->
+
+<!---   http://www.apache.org/licenses/LICENSE-2.0 -->
+
+<!--- Unless required by applicable law or agreed to in writing, -->
+<!--- software distributed under the License is distributed on an -->
+<!--- "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY -->
+<!--- KIND, either express or implied.  See the License for the -->
+<!--- specific language governing permissions and limitations -->
+<!--- under the License. -->
+
+* Feature Name: Meta Schedule (AutoTIR)
+* Start Date: 2021-05-28
+* RFC PR: TBD (apache/tvm-rfcs#0000)
+* GitHub Issue: TBD (apache/tvm-rfcs#0000)
+
+## 1. Summary
+
+This proposal introduces Meta Schedule: a probabilistic scheduling DSL on TIR that unifies the approaches of AutoTVM and Auto Scheduler (Ansor). Meta schedule provides a pragmatic way to define the space of automatic tuning, extensibility in terms of all possible TIR schedule primitives like tensorization and loop partitioning, and customizability on every layer of the automation system.
+
+Meta Schedule is our 3rd generation automatic scheduling system.
+
+## 2. Motivation
+
+**Scheduling and Design Space**
+
+In TVM TensorIR, optimization of a TensorIR program is done via a sequence of transformations. For example, we reorder loops for better locality and we tensorize for specific hardware intrinsics. The process of invoking such a set of pre-defined transformations is called "**scheduling**", and each transformation is called a "**schedule primitive**". These primitives form a domain-specific language (DSL) describing the transformation of TensorIR programs. **Design space** is the set of all possible schedulings with respect to a TensorIR program.
+
+**Problems with the Current Scheduling System**
+
+* **Manual schedule**: Developers optimize their programs by manually invoking schedule primitives, i.e. explore points in the design space with humans in the loop. This can be a tedious and error-prone approach, hence the creation of AutoTVM and AutoScheduler (Ansor).
+* **AutoTVM**: The automation system requires users to define "schedule templates" as the design space for each operator. Therefore, it is inextensible to hundreds of operators and variety hardware platforms.
+* **AutoScheduler (Ansor)**: It automatically generates schedule templates as the design space, according to a set of predefined "search rules". However, it is non-trivial to extend AutoScheduler to new schedule primitives (tensorize, loop partition, software pipelining, etc).
+* The three systems above have isolated sets of APIs with several layers of their own abstraction, which are not only hard to learn, but also engineering-intensive to customize.
+
+**Benefit of Meta Schedule**
+
+* Succinct syntax, consistent APIs to TensorIR schedule with no other layer of abstraction.
+* Provides unified APIs for implementing manual schedule, AutoTVM and AutoScheduler (Ansor).
+* Extensibility to all the schedule primitives, including tensorization and loop partitioning. Almost no extra effort is needed to use a new primitive in auto-tuning.
+* The automation infrastructure is extensible on every of its components. Every component of the system can be customized easily in pure python or C++ or both. For example, one can develop a new design space generator in python, a new ProgramRunner in python, etc.
+
+
+## 3. Guide-level explanation
+
+In this section, we describe the syntax of meta schedule DSL, and how it could be used to describe and auto-generate the design space.
+
+### 3.1. Manual Schedule
+
+Meta schedule APIs are almost the same as TE or TensorIR scheduling. Here is an example of a manual schedule for matrix multiplication:
+
+```python
+# Designate a set of tile sizes
+i_tiles = [16, 8, 8, 8]
+j_tiles = [16, 8, 8, 8]
+k_tiles = [256, 8]
+
+# Tile the loops according to the tile sizes
+i_0, i_1, i_2, i_3 = sch.split(loop=i, factors=i_tiles)
+j_0, j_1, j_2, j_3 = sch.split(loop=j, factors=j_tiles)
+k_0, k_1           = sch.split(loop=k, factors=k_tiles)
+
+# Organize the loops into "SSRSRS" 6-level tiles
+sch.reorder(
+    i_0, j_0, # S: the 1st spatial tile
+    i_1, j_1, # S: the 2nd spatial tile
+    k_0,      # R: the 1st reduction tile
+    i_2, j_2, # S: the 3rd spatial tile
+    k_1,      # R: the 2nd reduction tile
+    i_3, j_3, # S: the 4th spatial tile
+)
+```
+
+In this example, the developers may tweak the tile sizes and measure the performance of the generated kernels to explore the opportunities of potential optimization.
+
+Generally speaking, while writing a schedule, there are often some parameters that are hard to determine ahead of time, for example, tile sizes, unroll steps, or which tensor intrinsics to use. Developers may manually enumerate possible combinations of these unknown factors, and then pick the best schedule according to measurement results on their device.
+
+### 3.2. Composite Schedule Rules
+
+As introduced in the previous section, in TensorIR, each schedule primitive handles only a very basic transformation of the IR, for example, `split` only splits a loop into two. In the real world, the over-fine granularity of those primitives usually leads to repetitive and verbose scheduling code, as [mentioned](https://discuss.tvm.apache.org/t/rfc-tensorir-a-schedulable-ir-for-tvm/7872/43?u=junrushao1994) by developers in our community.
+
+To make it more convenient and modular, we allow users to register "composite schedules" that apply a sequence of schedule primitives according to certain analysis of the IR. For instance, a composite schedule may inspect a TensorIR block and decide whether we should call `compute_inline` on it.
+
+### 3.3. AutoTVM-style Design Space Description
+
+Meta schedule extends the schedule DSL with sampling instructions. When included in a schedule, these instructions parametrize the schedule from a single deterministic point to a space supported by random variables (tile size, etc.), making it possible for developers to describe the design space with meta schedule APIs.
+
+We can extend the matmul example above to cover all possible tilings using these sampling instructions:
+
+```python
+# Sample tile sizes
+i_tiles = sch.sample_perfect_tile(i, n=4)
+j_tiles = sch.sample_perfect_tile(j, n=4)
+k_tiles = sch.sample_perfect_tile(k, n=2)
+# Tile the loops according to the random variables
+i_0, i_1, i_2, i_3 = sch.split(loop=i, factors=i_tiles)
+j_0, j_1, j_2, j_3 = sch.split(loop=j, factors=j_tiles)
+k_0, k_1           = sch.split(loop=k, factors=k_tiles)
+# Organize the loops into "SSRSRS" 6-level tiles
+sch.reorder(
+    i_0, j_0, # S: the 1st spatial tile
+    i_1, j_1, # S: the 2nd spatial tile
+    k_0,      # R: the 1st reduction tile
+    i_2, j_2, # S: the 3rd spatial tile
+    k_1,      # R: the 2nd reduction tile
+    i_3, j_3, # S: the 4th spatial tile
+)
+```
+
+### 3.4. AutoScheduler-style Design Space Generation
+
+AutoScheduler (Ansor) generates schedule templates by applying their SearchRules to each stage. SearchRule analyzes TE and eagerly trigger schedule primitives accordingly in its internally maintained mini IR.
+
+As introduced in Section 3.2, composite schedule rules are equivalent to AutoScheduler's SearchRule in TensorIR scheduling. To further generate a design space for scheduling, sampling instructions are used in composite schedule rules. Similarly, the sketch generation phase in AutoScheduler is equivalent to applying composite schedule rules to each block in TensorIR.
+
+Several built-in composite schedule rules are shipped with our system to align with Ansor's design space:
+
+* Multi-level tiling
+* Inline pure spatial blocks
+* Parallelize & vectorize & unroll
+* Auto tensorize
+* ...
+
+Developers may implement their own rules in either Python or C++, and specify which rules to use with the following syntax:

Review comment:
       ```suggestion
   Developers may implement their own rules in either Python or C++. They may specify which rules to use with the following syntax:
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tvm-rfcs] junrushao1994 commented on a change in pull request #5: [RFC] Meta Schedule (AutoTIR)

Posted by GitBox <gi...@apache.org>.
junrushao1994 commented on a change in pull request #5:
URL: https://github.com/apache/tvm-rfcs/pull/5#discussion_r668299195



##########
File path: rfcs/0001-meta-schedule-autotensorir.md
##########
@@ -0,0 +1,444 @@
+<!--- Licensed to the Apache Software Foundation (ASF) under one -->
+<!--- or more contributor license agreements.  See the NOTICE file -->
+<!--- distributed with this work for additional information -->
+<!--- regarding copyright ownership.  The ASF licenses this file -->
+<!--- to you under the Apache License, Version 2.0 (the -->
+<!--- "License"); you may not use this file except in compliance -->
+<!--- with the License.  You may obtain a copy of the License at -->
+
+<!---   http://www.apache.org/licenses/LICENSE-2.0 -->
+
+<!--- Unless required by applicable law or agreed to in writing, -->
+<!--- software distributed under the License is distributed on an -->
+<!--- "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY -->
+<!--- KIND, either express or implied.  See the License for the -->
+<!--- specific language governing permissions and limitations -->
+<!--- under the License. -->
+
+* Feature Name: Meta Schedule (Formerly AutoTIR)
+* Start Date: 2021-05-28
+* RFC PR: TBD (apache/tvm-rfcs#0000)
+* GitHub Issue: TBD (apache/tvm-rfcs#0000)
+
+## 1. Summary
+
+This proposal introduces Meta Schedule: a probabilistic scheduling DSL on TIR that unifies the
+approaches of AutoTVM and Auto Scheduler (Ansor). Meta schedule provides a pragmatic way to define
+the space of automatic tuning, extensibility in terms of all possible TIR schedule primitives like
+tensorization and loop partitioning, and customizability on every layer of the automation system.
+
+Meta Schedule is our 3rd generation automatic scheduling system.
+
+## 2. Motivation
+
+**Scheduling and design space.** In TVM TensorIR, optimization of a TensorIR program is done via a
+sequence of transformations. For example, we reorder loops for better locality and we tensorize for
+specific hardware intrinsics. The process of invoking such a set of pre-defined transformations is
+called "**scheduling**", and each transformation is called a "**schedule primitive**". These
+primitives form a domain-specific language (DSL) describing the transformation of TensorIR programs.
+**Design space** is the set of all possible schedulings with respect to a TensorIR program.
+
+**Problems with the current scheduling system.** Currently we have 3 sets of scheduling APIs:
+* **Manual schedule**: Developers optimize their programs by manually invoking schedule primitives,
+  i.e. explore points in the design space with humans in the loop. This can be a tedious and
+  error-prone approach, hence the creation of AutoTVM and AutoScheduler (Ansor).
+* **AutoTVM**: The automation system requires users to define "schedule templates" as the design
+  space for each operator. Therefore, it is inextensible to hundreds of operators and variety
+  hardware platforms.
+* **AutoScheduler (Ansor)**: It automatically generates schedule templates as the design space,
+  according to a set of predefined "search rules". However, it is non-trivial to extend
+  AutoScheduler to new schedule primitives (tensorize, loop partition, software pipelining, etc).
+* The three systems above have isolated sets of APIs with several layers of their own abstraction,
+  which are not only hard to learn, but also engineering-intensive to customize.
+
+**Benefits of Meta Schedule.**  Meta schedule provides:
+* Succinct syntax, consistent APIs to TensorIR schedule with no other layer of abstraction.
+* Unified APIs for implementing manual schedules, AutoTVM-style schedules, and AutoScheduler-style
+  schedules.
+* Extensibility of all the schedule primitives, including tensorization and loop partitioning.
+  Almost no extra effort is needed to use a new primitive in auto-tuning.
+* The automation infrastructure is extensible on every of its components. Every component of the
+  system can be customized easily in pure python or C++ or both. For example, one can develop a new
+  design space generator in python, a new ProgramRunner in python, etc.
+
+
+## 3. Guide-level explanation
+
+In this section, we describe the syntax of meta schedule DSL, and how it could be used to describe
+and auto-generate the design space.
+
+### 3.1. Manual Schedule
+
+Meta schedule APIs are almost the same as TE or TensorIR scheduling. Here is an example of a manual
+schedule for matrix multiplication:
+
+```python
+# Designate a set of tile sizes
+i_tiles = [16, 8, 8, 8]
+j_tiles = [16, 8, 8, 8]
+k_tiles = [256, 8]
+
+# Tile the loops according to the tile sizes
+i_0, i_1, i_2, i_3 = sch.split(loop=i, factors=i_tiles)
+j_0, j_1, j_2, j_3 = sch.split(loop=j, factors=j_tiles)
+k_0, k_1           = sch.split(loop=k, factors=k_tiles)
+
+# Organize the loops into "SSRSRS" 6-level tiles
+sch.reorder(
+    i_0, j_0, # S: the 1st spatial tile
+    i_1, j_1, # S: the 2nd spatial tile
+    k_0,      # R: the 1st reduction tile
+    i_2, j_2, # S: the 3rd spatial tile
+    k_1,      # R: the 2nd reduction tile
+    i_3, j_3, # S: the 4th spatial tile
+)
+```
+
+In this example, the developers may tweak the tile sizes and measure the performance of the
+generated kernels to explore the opportunities of potential optimization.
+
+Generally speaking, while writing a schedule, there are often some parameters that are hard to
+determine ahead of time, for example, tile sizes, unroll steps, or which tensor intrinsics to use.
+Developers may manually enumerate possible combinations of these unknown factors, and then pick the
+best schedule according to measurement results on their device.
+
+### 3.2. Composite Schedule Rules
+
+As introduced in the previous section, in TensorIR, each schedule primitive handles only a very
+basic transformation of the IR. For example, `split` only splits a loop into two new loops. In the
+real world, the over-fine granularity of those primitives usually leads to repetitive and verbose
+scheduling code, as
+[mentioned](https://discuss.tvm.apache.org/t/rfc-tensorir-a-schedulable-ir-for-tvm/7872/43?u=junrushao1994)
+by developers in our community.
+
+To make it more convenient and modular, we allow users to register "composite schedules" that apply
+a sequence of schedule primitives according to certain analysis of the IR. For instance, a composite
+schedule may inspect a TensorIR block and decide whether we should call `compute_inline` on it.
+
+### 3.3. AutoTVM-style Design Space Description
+
+Meta schedule extends the schedule DSL with sampling instructions. When included in a schedule,
+these instructions parametrize the schedule from a single deterministic point to a space supported
+by random variables (tile size, etc.), making it possible for developers to describe the design
+space with meta schedule APIs.
+
+We can extend the matmul example above to cover all possible tilings using these sampling
+instructions:
+
+```python
+# Sample tile sizes
+i_tiles = sch.sample_perfect_tile(i, n=4)
+j_tiles = sch.sample_perfect_tile(j, n=4)
+k_tiles = sch.sample_perfect_tile(k, n=2)
+# Tile the loops according to the random variables
+i_0, i_1, i_2, i_3 = sch.split(loop=i, factors=i_tiles)
+j_0, j_1, j_2, j_3 = sch.split(loop=j, factors=j_tiles)
+k_0, k_1           = sch.split(loop=k, factors=k_tiles)
+# Organize the loops into "SSRSRS" 6-level tiles
+sch.reorder(
+    i_0, j_0, # S: the 1st spatial tile
+    i_1, j_1, # S: the 2nd spatial tile
+    k_0,      # R: the 1st reduction tile
+    i_2, j_2, # S: the 3rd spatial tile
+    k_1,      # R: the 2nd reduction tile
+    i_3, j_3, # S: the 4th spatial tile
+)
+```
+
+### 3.4. AutoScheduler-style Design Space Generation
+
+AutoScheduler (Ansor) generates schedule templates by applying their SearchRules to each stage.
+SearchRule analyzes TE and eagerly trigger schedule primitives accordingly in its internally
+maintained mini IR.
+
+As introduced in Section 3.2, composite schedule rules are equivalent to AutoScheduler's SearchRule
+in TensorIR scheduling. To further generate a design space for scheduling, sampling instructions are

Review comment:
       Good point! The writing here is a bit confusing. Let me try to clarify a bit




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tvm-rfcs] junrushao1994 commented on a change in pull request #5: [RFC] Meta Schedule (AutoTIR)

Posted by GitBox <gi...@apache.org>.
junrushao1994 commented on a change in pull request #5:
URL: https://github.com/apache/tvm-rfcs/pull/5#discussion_r668383535



##########
File path: rfcs/0001-meta-schedule-autotensorir.md
##########
@@ -0,0 +1,444 @@
+<!--- Licensed to the Apache Software Foundation (ASF) under one -->
+<!--- or more contributor license agreements.  See the NOTICE file -->
+<!--- distributed with this work for additional information -->
+<!--- regarding copyright ownership.  The ASF licenses this file -->
+<!--- to you under the Apache License, Version 2.0 (the -->
+<!--- "License"); you may not use this file except in compliance -->
+<!--- with the License.  You may obtain a copy of the License at -->
+
+<!---   http://www.apache.org/licenses/LICENSE-2.0 -->
+
+<!--- Unless required by applicable law or agreed to in writing, -->
+<!--- software distributed under the License is distributed on an -->
+<!--- "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY -->
+<!--- KIND, either express or implied.  See the License for the -->
+<!--- specific language governing permissions and limitations -->
+<!--- under the License. -->
+
+* Feature Name: Meta Schedule (Formerly AutoTIR)
+* Start Date: 2021-05-28
+* RFC PR: TBD (apache/tvm-rfcs#0000)
+* GitHub Issue: TBD (apache/tvm-rfcs#0000)
+
+## 1. Summary
+
+This proposal introduces Meta Schedule: a probabilistic scheduling DSL on TIR that unifies the
+approaches of AutoTVM and Auto Scheduler (Ansor). Meta schedule provides a pragmatic way to define
+the space of automatic tuning, extensibility in terms of all possible TIR schedule primitives like
+tensorization and loop partitioning, and customizability on every layer of the automation system.
+
+Meta Schedule is our 3rd generation automatic scheduling system.
+
+## 2. Motivation
+
+**Scheduling and design space.** In TVM TensorIR, optimization of a TensorIR program is done via a
+sequence of transformations. For example, we reorder loops for better locality and we tensorize for
+specific hardware intrinsics. The process of invoking such a set of pre-defined transformations is
+called "**scheduling**", and each transformation is called a "**schedule primitive**". These
+primitives form a domain-specific language (DSL) describing the transformation of TensorIR programs.
+**Design space** is the set of all possible schedulings with respect to a TensorIR program.
+
+**Problems with the current scheduling system.** Currently we have 3 sets of scheduling APIs:
+* **Manual schedule**: Developers optimize their programs by manually invoking schedule primitives,
+  i.e. explore points in the design space with humans in the loop. This can be a tedious and
+  error-prone approach, hence the creation of AutoTVM and AutoScheduler (Ansor).
+* **AutoTVM**: The automation system requires users to define "schedule templates" as the design
+  space for each operator. Therefore, it is inextensible to hundreds of operators and variety
+  hardware platforms.
+* **AutoScheduler (Ansor)**: It automatically generates schedule templates as the design space,
+  according to a set of predefined "search rules". However, it is non-trivial to extend
+  AutoScheduler to new schedule primitives (tensorize, loop partition, software pipelining, etc).
+* The three systems above have isolated sets of APIs with several layers of their own abstraction,
+  which are not only hard to learn, but also engineering-intensive to customize.
+
+**Benefits of Meta Schedule.**  Meta schedule provides:
+* Succinct syntax, consistent APIs to TensorIR schedule with no other layer of abstraction.
+* Unified APIs for implementing manual schedules, AutoTVM-style schedules, and AutoScheduler-style
+  schedules.
+* Extensibility of all the schedule primitives, including tensorization and loop partitioning.
+  Almost no extra effort is needed to use a new primitive in auto-tuning.
+* The automation infrastructure is extensible on every of its components. Every component of the
+  system can be customized easily in pure python or C++ or both. For example, one can develop a new
+  design space generator in python, a new ProgramRunner in python, etc.
+
+
+## 3. Guide-level explanation
+
+In this section, we describe the syntax of meta schedule DSL, and how it could be used to describe
+and auto-generate the design space.
+
+### 3.1. Manual Schedule
+
+Meta schedule APIs are almost the same as TE or TensorIR scheduling. Here is an example of a manual
+schedule for matrix multiplication:
+
+```python
+# Designate a set of tile sizes
+i_tiles = [16, 8, 8, 8]
+j_tiles = [16, 8, 8, 8]
+k_tiles = [256, 8]
+
+# Tile the loops according to the tile sizes
+i_0, i_1, i_2, i_3 = sch.split(loop=i, factors=i_tiles)
+j_0, j_1, j_2, j_3 = sch.split(loop=j, factors=j_tiles)
+k_0, k_1           = sch.split(loop=k, factors=k_tiles)
+
+# Organize the loops into "SSRSRS" 6-level tiles
+sch.reorder(
+    i_0, j_0, # S: the 1st spatial tile
+    i_1, j_1, # S: the 2nd spatial tile
+    k_0,      # R: the 1st reduction tile
+    i_2, j_2, # S: the 3rd spatial tile
+    k_1,      # R: the 2nd reduction tile
+    i_3, j_3, # S: the 4th spatial tile
+)
+```
+
+In this example, the developers may tweak the tile sizes and measure the performance of the
+generated kernels to explore the opportunities of potential optimization.
+
+Generally speaking, while writing a schedule, there are often some parameters that are hard to
+determine ahead of time, for example, tile sizes, unroll steps, or which tensor intrinsics to use.
+Developers may manually enumerate possible combinations of these unknown factors, and then pick the
+best schedule according to measurement results on their device.
+
+### 3.2. Composite Schedule Rules
+
+As introduced in the previous section, in TensorIR, each schedule primitive handles only a very
+basic transformation of the IR. For example, `split` only splits a loop into two new loops. In the
+real world, the over-fine granularity of those primitives usually leads to repetitive and verbose
+scheduling code, as
+[mentioned](https://discuss.tvm.apache.org/t/rfc-tensorir-a-schedulable-ir-for-tvm/7872/43?u=junrushao1994)
+by developers in our community.
+
+To make it more convenient and modular, we allow users to register "composite schedules" that apply
+a sequence of schedule primitives according to certain analysis of the IR. For instance, a composite
+schedule may inspect a TensorIR block and decide whether we should call `compute_inline` on it.
+
+### 3.3. AutoTVM-style Design Space Description
+
+Meta schedule extends the schedule DSL with sampling instructions. When included in a schedule,
+these instructions parametrize the schedule from a single deterministic point to a space supported
+by random variables (tile size, etc.), making it possible for developers to describe the design
+space with meta schedule APIs.
+
+We can extend the matmul example above to cover all possible tilings using these sampling
+instructions:
+
+```python
+# Sample tile sizes
+i_tiles = sch.sample_perfect_tile(i, n=4)
+j_tiles = sch.sample_perfect_tile(j, n=4)
+k_tiles = sch.sample_perfect_tile(k, n=2)
+# Tile the loops according to the random variables
+i_0, i_1, i_2, i_3 = sch.split(loop=i, factors=i_tiles)
+j_0, j_1, j_2, j_3 = sch.split(loop=j, factors=j_tiles)
+k_0, k_1           = sch.split(loop=k, factors=k_tiles)
+# Organize the loops into "SSRSRS" 6-level tiles
+sch.reorder(
+    i_0, j_0, # S: the 1st spatial tile
+    i_1, j_1, # S: the 2nd spatial tile
+    k_0,      # R: the 1st reduction tile
+    i_2, j_2, # S: the 3rd spatial tile
+    k_1,      # R: the 2nd reduction tile
+    i_3, j_3, # S: the 4th spatial tile
+)
+```
+
+### 3.4. AutoScheduler-style Design Space Generation
+
+AutoScheduler (Ansor) generates schedule templates by applying their SearchRules to each stage.
+SearchRule analyzes TE and eagerly trigger schedule primitives accordingly in its internally
+maintained mini IR.
+
+As introduced in Section 3.2, composite schedule rules are equivalent to AutoScheduler's SearchRule
+in TensorIR scheduling. To further generate a design space for scheduling, sampling instructions are
+used in composite schedule rules. Similarly, the sketch generation phase in AutoScheduler is
+equivalent to applying composite schedule rules to each block in TensorIR.
+
+Several built-in composite schedule rules are shipped with our system to align with Ansor's design
+space:
+
+* Multi-level tiling
+* Inline pure spatial blocks
+* Parallelize & vectorize & unroll
+* Auto tensorize
+
+Developers may implement their own rules in either Python or C++. They may specify which rules to
+use with the following syntax:
+
+```python
+from tvm import meta_schedule as ms
+
+design_space_generator = ms.PostOrderApply(rules=[
+    ms.MultiLevelTiling(...),
+    CustomRule(...),
+    ms.OtherBuiltinRules(...),
+])
+
+```
+
+### 3.5. Unifying manual schedule / AutoTVM / Ansor
+
+In this section, we show that the design space induced by TE manual schedule, AutoTVM and Ansor are
+all subsets of meta schedule, and meta schedule further allows mixing those three styles to search
+jointly.
+
+**Manual schedule**. The TE schedule is a special case of a meta schedule program, where there is no
+randomness introduced by sampling instructions. It is a single point in terms of design space.
+
+**AutoTVM (Template-based tuning)**. It is more natural representation of AutoTVM’s schedule
+templates (knobs) by writing one or more schedule functions in meta schedule with sampling
+instructions. The execution traces generated by the schedule functions are the design space to be
+explored.
+
+**AutoScheduler (Ansor, Template-free tuning)**. As mentioned in the previous section, application
+of composite schedule rules generates the design space, which is equivalent to Ansor’s sketch
+generation.
+
+**Mixing styles in design space definition**. By taking union of the spaces induced by the three
+special cases, our system allows developers to combine generic rules that Ansor provides and
+operator-specific scheduling.
+
+## 4. Reference-level explanation
+
+In this section, we introduce the underlying techniques for the automation system to extract and
+explore the design space. The figure below briefly illustrates the workflow of the system:
+
+![meta-schedule-workflow](../resources/meta-schedule-workflow.png)
+
+### 4.1. Execution trace as the design space

Review comment:
       `Trace` is more like an underlying mechanism, which is not strictly user facing, so that's why i put it in reference-level rather than guide-line. wdty?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tvm-rfcs] junrushao1994 commented on a change in pull request #5: [RFC] Meta Schedule (AutoTIR)

Posted by GitBox <gi...@apache.org>.
junrushao1994 commented on a change in pull request #5:
URL: https://github.com/apache/tvm-rfcs/pull/5#discussion_r668514392



##########
File path: rfcs/0001-meta-schedule-autotensorir.md
##########
@@ -0,0 +1,444 @@
+<!--- Licensed to the Apache Software Foundation (ASF) under one -->
+<!--- or more contributor license agreements.  See the NOTICE file -->
+<!--- distributed with this work for additional information -->
+<!--- regarding copyright ownership.  The ASF licenses this file -->
+<!--- to you under the Apache License, Version 2.0 (the -->
+<!--- "License"); you may not use this file except in compliance -->
+<!--- with the License.  You may obtain a copy of the License at -->
+
+<!---   http://www.apache.org/licenses/LICENSE-2.0 -->
+
+<!--- Unless required by applicable law or agreed to in writing, -->
+<!--- software distributed under the License is distributed on an -->
+<!--- "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY -->
+<!--- KIND, either express or implied.  See the License for the -->
+<!--- specific language governing permissions and limitations -->
+<!--- under the License. -->
+
+* Feature Name: Meta Schedule (Formerly AutoTIR)
+* Start Date: 2021-05-28
+* RFC PR: TBD (apache/tvm-rfcs#0000)
+* GitHub Issue: TBD (apache/tvm-rfcs#0000)
+
+## 1. Summary
+
+This proposal introduces Meta Schedule: a probabilistic scheduling DSL on TIR that unifies the
+approaches of AutoTVM and Auto Scheduler (Ansor). Meta schedule provides a pragmatic way to define
+the space of automatic tuning, extensibility in terms of all possible TIR schedule primitives like
+tensorization and loop partitioning, and customizability on every layer of the automation system.
+
+Meta Schedule is our 3rd generation automatic scheduling system.
+
+## 2. Motivation
+
+**Scheduling and design space.** In TVM TensorIR, optimization of a TensorIR program is done via a
+sequence of transformations. For example, we reorder loops for better locality and we tensorize for
+specific hardware intrinsics. The process of invoking such a set of pre-defined transformations is
+called "**scheduling**", and each transformation is called a "**schedule primitive**". These
+primitives form a domain-specific language (DSL) describing the transformation of TensorIR programs.
+**Design space** is the set of all possible schedulings with respect to a TensorIR program.
+
+**Problems with the current scheduling system.** Currently we have 3 sets of scheduling APIs:
+* **Manual schedule**: Developers optimize their programs by manually invoking schedule primitives,
+  i.e. explore points in the design space with humans in the loop. This can be a tedious and
+  error-prone approach, hence the creation of AutoTVM and AutoScheduler (Ansor).
+* **AutoTVM**: The automation system requires users to define "schedule templates" as the design
+  space for each operator. Therefore, it is inextensible to hundreds of operators and variety
+  hardware platforms.
+* **AutoScheduler (Ansor)**: It automatically generates schedule templates as the design space,
+  according to a set of predefined "search rules". However, it is non-trivial to extend
+  AutoScheduler to new schedule primitives (tensorize, loop partition, software pipelining, etc).
+* The three systems above have isolated sets of APIs with several layers of their own abstraction,
+  which are not only hard to learn, but also engineering-intensive to customize.
+
+**Benefits of Meta Schedule.**  Meta schedule provides:
+* Succinct syntax, consistent APIs to TensorIR schedule with no other layer of abstraction.
+* Unified APIs for implementing manual schedules, AutoTVM-style schedules, and AutoScheduler-style
+  schedules.
+* Extensibility of all the schedule primitives, including tensorization and loop partitioning.
+  Almost no extra effort is needed to use a new primitive in auto-tuning.
+* The automation infrastructure is extensible on every of its components. Every component of the
+  system can be customized easily in pure python or C++ or both. For example, one can develop a new
+  design space generator in python, a new ProgramRunner in python, etc.
+
+
+## 3. Guide-level explanation
+
+In this section, we describe the syntax of meta schedule DSL, and how it could be used to describe
+and auto-generate the design space.
+
+### 3.1. Manual Schedule
+
+Meta schedule APIs are almost the same as TE or TensorIR scheduling. Here is an example of a manual
+schedule for matrix multiplication:
+
+```python
+# Designate a set of tile sizes
+i_tiles = [16, 8, 8, 8]
+j_tiles = [16, 8, 8, 8]
+k_tiles = [256, 8]
+
+# Tile the loops according to the tile sizes
+i_0, i_1, i_2, i_3 = sch.split(loop=i, factors=i_tiles)
+j_0, j_1, j_2, j_3 = sch.split(loop=j, factors=j_tiles)
+k_0, k_1           = sch.split(loop=k, factors=k_tiles)
+
+# Organize the loops into "SSRSRS" 6-level tiles
+sch.reorder(
+    i_0, j_0, # S: the 1st spatial tile
+    i_1, j_1, # S: the 2nd spatial tile
+    k_0,      # R: the 1st reduction tile
+    i_2, j_2, # S: the 3rd spatial tile
+    k_1,      # R: the 2nd reduction tile
+    i_3, j_3, # S: the 4th spatial tile
+)
+```
+
+In this example, the developers may tweak the tile sizes and measure the performance of the
+generated kernels to explore the opportunities of potential optimization.
+
+Generally speaking, while writing a schedule, there are often some parameters that are hard to
+determine ahead of time, for example, tile sizes, unroll steps, or which tensor intrinsics to use.
+Developers may manually enumerate possible combinations of these unknown factors, and then pick the
+best schedule according to measurement results on their device.
+
+### 3.2. Composite Schedule Rules
+
+As introduced in the previous section, in TensorIR, each schedule primitive handles only a very
+basic transformation of the IR. For example, `split` only splits a loop into two new loops. In the
+real world, the over-fine granularity of those primitives usually leads to repetitive and verbose
+scheduling code, as
+[mentioned](https://discuss.tvm.apache.org/t/rfc-tensorir-a-schedulable-ir-for-tvm/7872/43?u=junrushao1994)
+by developers in our community.
+
+To make it more convenient and modular, we allow users to register "composite schedules" that apply
+a sequence of schedule primitives according to certain analysis of the IR. For instance, a composite
+schedule may inspect a TensorIR block and decide whether we should call `compute_inline` on it.
+
+### 3.3. AutoTVM-style Design Space Description
+
+Meta schedule extends the schedule DSL with sampling instructions. When included in a schedule,
+these instructions parametrize the schedule from a single deterministic point to a space supported
+by random variables (tile size, etc.), making it possible for developers to describe the design
+space with meta schedule APIs.
+
+We can extend the matmul example above to cover all possible tilings using these sampling
+instructions:
+
+```python
+# Sample tile sizes
+i_tiles = sch.sample_perfect_tile(i, n=4)
+j_tiles = sch.sample_perfect_tile(j, n=4)
+k_tiles = sch.sample_perfect_tile(k, n=2)
+# Tile the loops according to the random variables
+i_0, i_1, i_2, i_3 = sch.split(loop=i, factors=i_tiles)
+j_0, j_1, j_2, j_3 = sch.split(loop=j, factors=j_tiles)
+k_0, k_1           = sch.split(loop=k, factors=k_tiles)
+# Organize the loops into "SSRSRS" 6-level tiles
+sch.reorder(
+    i_0, j_0, # S: the 1st spatial tile
+    i_1, j_1, # S: the 2nd spatial tile
+    k_0,      # R: the 1st reduction tile
+    i_2, j_2, # S: the 3rd spatial tile
+    k_1,      # R: the 2nd reduction tile
+    i_3, j_3, # S: the 4th spatial tile
+)
+```
+
+### 3.4. AutoScheduler-style Design Space Generation
+
+AutoScheduler (Ansor) generates schedule templates by applying their SearchRules to each stage.
+SearchRule analyzes TE and eagerly trigger schedule primitives accordingly in its internally
+maintained mini IR.
+
+As introduced in Section 3.2, composite schedule rules are equivalent to AutoScheduler's SearchRule
+in TensorIR scheduling. To further generate a design space for scheduling, sampling instructions are
+used in composite schedule rules. Similarly, the sketch generation phase in AutoScheduler is
+equivalent to applying composite schedule rules to each block in TensorIR.
+
+Several built-in composite schedule rules are shipped with our system to align with Ansor's design
+space:
+
+* Multi-level tiling
+* Inline pure spatial blocks
+* Parallelize & vectorize & unroll
+* Auto tensorize
+
+Developers may implement their own rules in either Python or C++. They may specify which rules to
+use with the following syntax:
+
+```python
+from tvm import meta_schedule as ms
+
+design_space_generator = ms.PostOrderApply(rules=[
+    ms.MultiLevelTiling(...),
+    CustomRule(...),
+    ms.OtherBuiltinRules(...),
+])
+
+```
+
+### 3.5. Unifying manual schedule / AutoTVM / Ansor
+
+In this section, we show that the design space induced by TE manual schedule, AutoTVM and Ansor are
+all subsets of meta schedule, and meta schedule further allows mixing those three styles to search
+jointly.
+
+**Manual schedule**. The TE schedule is a special case of a meta schedule program, where there is no
+randomness introduced by sampling instructions. It is a single point in terms of design space.
+
+**AutoTVM (Template-based tuning)**. It is more natural representation of AutoTVM’s schedule
+templates (knobs) by writing one or more schedule functions in meta schedule with sampling
+instructions. The execution traces generated by the schedule functions are the design space to be
+explored.
+
+**AutoScheduler (Ansor, Template-free tuning)**. As mentioned in the previous section, application
+of composite schedule rules generates the design space, which is equivalent to Ansor’s sketch
+generation.
+
+**Mixing styles in design space definition**. By taking union of the spaces induced by the three
+special cases, our system allows developers to combine generic rules that Ansor provides and
+operator-specific scheduling.
+
+## 4. Reference-level explanation
+
+In this section, we introduce the underlying techniques for the automation system to extract and
+explore the design space. The figure below briefly illustrates the workflow of the system:
+
+![meta-schedule-workflow](../resources/meta-schedule-workflow.png)
+
+### 4.1. Execution trace as the design space
+
+**Trace**. To represent the design space defined by the meta schedule DSL, the underlying system
+records all the instructions users applied to the schedule class, including sampling and schedule
+primitives. We call this list of instructions a trace.
+
+For instance, executing the example above results in the following trace:
+
+```
+Instruction 0. Sample-Perfect-Tile
+Instruction 1. Sample-Perfect-Tile
+Instruction 2. Sample-Perfect-Tile
+Instruction 3. Split
+Instruction 4. Split
+Instruction 5. Split
+Instruction 6. Reorder
+```
+
+**Trace forms design space.** A trace may contain zero or more sampling instructions, which
+introduces the uncertainty in scheduling - one instance of sampling results in one point in the
+design space. Therefore, the trace itself forms a design space to explore, e.g. which set of tile
+sizes works best on a specific hardware.
+
+**Union of design space**. Our system works on a set of traces, representing the union of the design
+spaces represented by every single trace.
+
+**Fork a trace**. When two different decisions in the scheduling process are equally important to
+generate high-performance schedules, we allow forking the trace into two, and the design space is
+the union of the forked traces.
+
+The trace is not strictly user-facing, but can be accessed and printed with the following syntax:
+
+```python
+# requires to trace the execution
+sch = tir.Schedule(..., traced=True)
+# do a lot of scheduling
+...
+# print the trace
+print(sch.trace)
+```
+
+And below is an example of the printed trace, which honestly reflects the schedule as a snippet of
+python scheduling function:
+
+```python
+b0 = sch.get_block(name="matmul", func_name="main")
+l1, l2, l3 = sch.get_loops(block=b0)
+v4, v5, v6, v7 = sch.sample_perfect_tile(loop=l1, n=4, max_innermost_factor=16, decision=[32, 1, 16, 2])
+v8, v9, v10, v11 = sch.sample_perfect_tile(loop=l2, n=4, max_innermost_factor=16, decision=[64, 4, 2, 2])
+v12, v13 = sch.sample_perfect_tile(loop=l3, n=2, max_innermost_factor=16, decision=[64, 16])
+l14, l15, l16, l17 = sch.split(loop=l1, factors=[v4, v5, v6, v7])
+l18, l19, l20, l21 = sch.split(loop=l2, factors=[v8, v9, v10, v11])
+l22, l23 = sch.split(loop=l3, factors=[v12, v13])
+sch.reorder(l14, l18, l15, l19, l22, l16, l20, l23, l17, l21)
+```
+
+### 4.2. Exploring the Design Space
+
+Meta Schedule provides several built-in exploration strategies to exhaustively or efficiently search
+for efficient schedules.
+
+**Random search by replaying schedule functions.** With a user-provided schedule function
+as a black-box design space generator, our system could repetitively invoke such an opaque function
+without doing any extra analysis. The function could be written in C++ or Python, or any language
+that implements packed function FFI. If sampling instructions are present in the function, then each
+invocation results in a different IRModule after being scheduled because the random decisions are
+possibly changed across different runs. Effectively, it is equivalent to
+random exploration without trace, allowing the flexibility for users to define arbitrary functions
+that trace may not well support (e.g. control flow divergence based on the value of intermediate
+random variables), but it forbids future opportunity of any trace-based analysis.
+
+**Random search by replaying traces.** Traces are obtained from a design space generator, and
+replayed with a builtin interpreter in our system. If sampling instructions are present on the
+traces, then their random decisions are mutated during each replay, i.e. jumps to a new point in the
+design space. Therefore, repetitive replay of those traces are equivalent to exploration of the
+design space. Our system could potentially benefit from trace-based analysis, including rejecting
+obviously invalid schedules (e.g. using too much CUDA resources), doing dead-code elimination to
+simplify a trace, extracting trace-based features used in the cost model, etc.
+
+**Cost-model-guided evolutionary search**. A more efficient exploration strategy. We define two sets
+of rules:
+
+* Mutator: defines how to jump to a point’s "neighbor" in the design space
+* Postprocessor: after the trace is executed, there are some extra rules we want to execute, for
+  example:
+  * Check CUDA resource limits: There is a hard requirement in CUDA that the maximum number of
+    threads should not exceed 1024, but it is a random variable that cannot be determined before
+    actually executing the trace. In this case, we write a postprocessor that errors out when the
+    condition is not satisfied.
+  * Fuse outer loops until the extent of the fused loops is large enough: The number of outer loops
+    to be fused together depends on their extents, which are random variables. In this case, we
+    annotate the maximum extent allowed on the block, and do actual fusion in a postprocessor.
+
+Our evolutionary search algorithm uses mutators to find possible schedules in the design space, then
+applies postprocessors, and asks the cost model to predict its performance. After several
+iterations, the new schedules with the highest scores are finally compiled and measured on device.
+Epsilon-greedy is used in this process to balance exploitation and exploration.
+
+### 4.3. Python first for flexibility & customizability
+
+We implement the system in a way that all levels are decoupled and open to customization, aiming at
+providing a playground for developers to try out new ideas and potentially deliver performance
+quickly.
+
+While all the important APIs are implemented in C++ for efficiency, every part of the system can be
+easily switched to customized python implementation. For example,
+
+**Customize design space in python**. Can be a python function that does the schedule
+
+```python
+def schedule_matmul(sch) -> sch:
+    i, j, k = sch.get_loops(sch.get_block("matmul"))
+    i_tiles = sch.sample_perfect_tile(i, n=4)
+    j_tiles = sch.sample_perfect_tile(j, n=4)
+    k_tiles = sch.sample_perfect_tile(k, n=2)
+    # Tile the loops according to the random variables
+    i_0, i_1, i_2, i_3 = sch.split(loop=i, factors=i_tiles)
+    j_0, j_1, j_2, j_3 = sch.split(loop=j, factors=j_tiles)
+    k_0, k_1 = sch.split(loop=k, factors=k_tiles)
+    # Organize the loops into "SSRSRS" 6-level tiles
+    sch.reorder(
+        i_0, j_0, # S
+        i_1, j_1, # S
+        k_0,      # R
+        i_2, j_2, # S
+        k_1,      # R
+        i_3, j_3, # S
+    )
+    return sch
+```
+
+**Customize composite schedule in python**. We provide two ways to define a composite schedule in
+python:
+
+Method 1. Derive from `PyCompositeSchedule`, and implement two methods `initialize` and `apply`:
+
+```python
+class MultiLevelTiling(PyCompositeSchedule):
+    def initialize(...):
+        # initialize the class, usually this method is empty
+        ...
+
+    def apply(sch: Schedule, block: BlockRV) -> Union[Schedule, List[Schedule]]:
+        # write any python code, including:
+        # - analyze `block`
+        # - invoke schedule primitives in `sch`
+        # - do debug printing
+        ...
+```
+
+Method 2. A decorator as the syntactic sugar if the `initialize` method is empty, which converts the
+function to the `apply` method.
+
+```python
+@tir.as_composite_schedule(name="multi-level-tiling")
+def multi_level_tiling(sch: Schedule, block: BlockRV) -> Union[Schedule, List[Schedule]]:
+    # write any python code, including:
+    # - analyze `block`
+    # - invoke schedule primitives in `sch`
+    # - do debug printing
+    ...
+```
+
+**Customize exploration strategies in python**. Developers can implement any search algorithm in
+python as well by deriving from `PySearchPolicy`, and the syntax is identical to customizing with
+`PyCompositeSchedule`.
+
+**Other customizable components**. This list includes:
+
+* Cost model
+* Database
+* Measure callbacks
+* Feature extractor
+* Program builder & runner
+* Analysis methods
+* ...
+
+In a short summary, almost every component of the system is decoupled with each other and extensions
+could be easily plugged in.
+
+## 5. Drawbacks
+
+We are not aware of any drawbacks of the proposed system.
+
+## 6. Rationale and alternatives
+
+The system is designed with the principle of minimalism: different from alternative solutions, we do
+not require any change in existing codebase, or extra APIs to learn. It could potentially lower the
+bar of using automation systems. 
+
+Unifying manual scheduling, AutoTVM's semi automatic templates and AutoScheduler's (Ansor's) fully
+automatic sketch generation provides flexible way to balance injection new domain knowledge and
+automation.
+
+Flexibility in customization allows quick try-out on new tasks, new strategies and new hardware
+targets without deep knowledge of the system.
+
+## 7. Prior art
+
+**Tensor Expression (TE)** in TVM is a DSL that decouples compute and schedule, which provides
+convenient ways to handcraft optimized kernels for different hardware targets.
+
+**TensorIR** is the latest generation of TVM’s low-level IR. Its capability of eagerly applying
+schedule primitives opens the door for meta schedule, our proposed new-generation auto scheduling
+system.
+
+**AutoTVM** is the 1st generation automation framework in TVM, which requires developers to
+implement per-operator scheduling templates, and the system could handle the tuning process.
+
+**AutoScheduler (Ansor)** is the 2nd generation automation framework in TVM, whose built-in rules
+could automatically generate schedule templates for almost all the operators on CPU, GPU, etc.
+
+## 8. Unresolved questions
+
+**Supporting Control Flow and Assertions**
+
+Right now the meta schedule DSL does not support control flow. Although we didn’t see any real-world

Review comment:
       I am not super certain what the best syntax should look like. I added some explanation:
   
   ```markdown
   
   **Control Flow**
   
   The meta schedule DSL does not support control flow yet. Although there is no report of
   real-world use case at the time of writing, it is possible that it could appear in some future
   workloads. The best syntax of the control flow is not determined yet, but a working example could be
   TensorFlow's `tf.cond`.
   
   **Assertion**
   
   Sampling instructions may lead to wrong schedules on CUDA, e.g. the resulting program uses too much
   shared memory, too many threads, etc. It is detected by a postprocessor. To accelerate the process,
   it is possible that we introduce an assertion statement that exits early if the GPU code is not
   valid, and its syntax can be:
   
   ``"python
   sch.assert(j_2 * j_2 <= 1024)
   ``"
   
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tvm-rfcs] junrushao1994 commented on a change in pull request #5: [RFC] Meta Schedule (AutoTIR)

Posted by GitBox <gi...@apache.org>.
junrushao1994 commented on a change in pull request #5:
URL: https://github.com/apache/tvm-rfcs/pull/5#discussion_r664905216



##########
File path: rfcs/0001-meta-schedule-autotensorir.md
##########
@@ -0,0 +1,339 @@
+<!--- Licensed to the Apache Software Foundation (ASF) under one -->
+<!--- or more contributor license agreements.  See the NOTICE file -->
+<!--- distributed with this work for additional information -->
+<!--- regarding copyright ownership.  The ASF licenses this file -->
+<!--- to you under the Apache License, Version 2.0 (the -->
+<!--- "License"); you may not use this file except in compliance -->
+<!--- with the License.  You may obtain a copy of the License at -->
+
+<!---   http://www.apache.org/licenses/LICENSE-2.0 -->
+
+<!--- Unless required by applicable law or agreed to in writing, -->
+<!--- software distributed under the License is distributed on an -->
+<!--- "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY -->
+<!--- KIND, either express or implied.  See the License for the -->
+<!--- specific language governing permissions and limitations -->
+<!--- under the License. -->
+
+* Feature Name: Meta Schedule (AutoTIR)
+* Start Date: 2021-05-28
+* RFC PR: TBD (apache/tvm-rfcs#0000)
+* GitHub Issue: TBD (apache/tvm-rfcs#0000)
+
+## 1. Summary
+
+This proposal introduces Meta Schedule: a probabilistic scheduling DSL on TIR that unifies the approaches of AutoTVM and Auto Scheduler (Ansor). Meta schedule provides a pragmatic way to define the space of automatic tuning, extensibility in terms of all possible TIR schedule primitives like tensorization and loop partitioning, and customizability on every layer of the automation system.
+
+Meta Schedule is our 3rd generation automatic scheduling system.
+
+## 2. Motivation
+
+**Scheduling and Design Space**
+
+In TVM TensorIR, optimization of a TensorIR program is done via a sequence of transformations. For example, we reorder loops for better locality and we tensorize for specific hardware intrinsics. The process of invoking such a set of pre-defined transformations is called "**scheduling**", and each transformation is called a "**schedule primitive**". These primitives form a domain-specific language (DSL) describing the transformation of TensorIR programs. **Design space** is the set of all possible schedulings with respect to a TensorIR program.
+
+**Problems with the Current Scheduling System**
+
+* **Manual schedule**: Developers optimize their programs by manually invoking schedule primitives, i.e. explore points in the design space with humans in the loop. This can be a tedious and error-prone approach, hence the creation of AutoTVM and AutoScheduler (Ansor).
+* **AutoTVM**: The automation system requires users to define "schedule templates" as the design space for each operator. Therefore, it is inextensible to hundreds of operators and variety hardware platforms.
+* **AutoScheduler (Ansor)**: It automatically generates schedule templates as the design space, according to a set of predefined "search rules". However, it is non-trivial to extend AutoScheduler to new schedule primitives (tensorize, loop partition, software pipelining, etc).
+* The three systems above have isolated sets of APIs with several layers of their own abstraction, which are not only hard to learn, but also engineering-intensive to customize.
+
+**Benefit of Meta Schedule**
+
+* Succinct syntax, consistent APIs to TensorIR schedule with no other layer of abstraction.
+* Provides unified APIs for implementing manual schedule, AutoTVM and AutoScheduler (Ansor).
+* Extensibility to all the schedule primitives, including tensorization and loop partitioning. Almost no extra effort is needed to use a new primitive in auto-tuning.
+* The automation infrastructure is extensible on every of its components. Every component of the system can be customized easily in pure python or C++ or both. For example, one can develop a new design space generator in python, a new ProgramRunner in python, etc.
+
+
+## 3. Guide-level explanation
+
+In this section, we describe the syntax of meta schedule DSL, and how it could be used to describe and auto-generate the design space.
+
+### 3.1. Manual Schedule
+
+Meta schedule APIs are almost the same as TE or TensorIR scheduling. Here is an example of a manual schedule for matrix multiplication:
+
+```python
+# Designate a set of tile sizes
+i_tiles = [16, 8, 8, 8]
+j_tiles = [16, 8, 8, 8]
+k_tiles = [256, 8]
+
+# Tile the loops according to the tile sizes
+i_0, i_1, i_2, i_3 = sch.split(loop=i, factors=i_tiles)
+j_0, j_1, j_2, j_3 = sch.split(loop=j, factors=j_tiles)
+k_0, k_1           = sch.split(loop=k, factors=k_tiles)
+
+# Organize the loops into "SSRSRS" 6-level tiles
+sch.reorder(
+    i_0, j_0, # S: the 1st spatial tile
+    i_1, j_1, # S: the 2nd spatial tile
+    k_0,      # R: the 1st reduction tile
+    i_2, j_2, # S: the 3rd spatial tile
+    k_1,      # R: the 2nd reduction tile
+    i_3, j_3, # S: the 4th spatial tile
+)
+```
+
+In this example, the developers may tweak the tile sizes and measure the performance of the generated kernels to explore the opportunities of potential optimization.
+
+Generally speaking, while writing a schedule, there are often some parameters that are hard to determine ahead of time, for example, tile sizes, unroll steps, or which tensor intrinsics to use. Developers may manually enumerate possible combinations of these unknown factors, and then pick the best schedule according to measurement results on their device.
+
+### 3.2. Composite Schedule Rules
+
+As introduced in the previous section, in TensorIR, each schedule primitive handles only a very basic transformation of the IR, for example, `split` only splits a loop into two. In the real world, the over-fine granularity of those primitives usually leads to repetitive and verbose scheduling code, as [mentioned](https://discuss.tvm.apache.org/t/rfc-tensorir-a-schedulable-ir-for-tvm/7872/43?u=junrushao1994) by developers in our community.
+
+To make it more convenient and modular, we allow users to register "composite schedules" that apply a sequence of schedule primitives according to certain analysis of the IR. For instance, a composite schedule may inspect a TensorIR block and decide whether we should call `compute_inline` on it.
+
+### 3.3. AutoTVM-style Design Space Description
+
+Meta schedule extends the schedule DSL with sampling instructions. When included in a schedule, these instructions parametrize the schedule from a single deterministic point to a space supported by random variables (tile size, etc.), making it possible for developers to describe the design space with meta schedule APIs.
+
+We can extend the matmul example above to cover all possible tilings using these sampling instructions:
+
+```python
+# Sample tile sizes
+i_tiles = sch.sample_perfect_tile(i, n=4)
+j_tiles = sch.sample_perfect_tile(j, n=4)
+k_tiles = sch.sample_perfect_tile(k, n=2)
+# Tile the loops according to the random variables
+i_0, i_1, i_2, i_3 = sch.split(loop=i, factors=i_tiles)
+j_0, j_1, j_2, j_3 = sch.split(loop=j, factors=j_tiles)
+k_0, k_1           = sch.split(loop=k, factors=k_tiles)
+# Organize the loops into "SSRSRS" 6-level tiles
+sch.reorder(
+    i_0, j_0, # S: the 1st spatial tile
+    i_1, j_1, # S: the 2nd spatial tile
+    k_0,      # R: the 1st reduction tile
+    i_2, j_2, # S: the 3rd spatial tile
+    k_1,      # R: the 2nd reduction tile
+    i_3, j_3, # S: the 4th spatial tile
+)
+```
+
+### 3.4. AutoScheduler-style Design Space Generation
+
+AutoScheduler (Ansor) generates schedule templates by applying their SearchRules to each stage. SearchRule analyzes TE and eagerly trigger schedule primitives accordingly in its internally maintained mini IR.
+
+As introduced in Section 3.2, composite schedule rules are equivalent to AutoScheduler's SearchRule in TensorIR scheduling. To further generate a design space for scheduling, sampling instructions are used in composite schedule rules. Similarly, the sketch generation phase in AutoScheduler is equivalent to applying composite schedule rules to each block in TensorIR.
+
+Several built-in composite schedule rules are shipped with our system to align with Ansor's design space:
+
+* Multi-level tiling
+* Inline pure spatial blocks
+* Parallelize & vectorize & unroll
+* Auto tensorize
+* ...
+
+Developers may implement their own rules in either Python or C++, and specify which rules to use with the following syntax:
+
+```python
+from tvm import meta_schedule as ms
+
+design_space_generator = ms.PostOrderApply(rules=[
+    ms.MultiLevelTiling(...),
+    CustomRule(...),
+    ms.OtherBuiltinRules(...),
+])
+
+```
+
+### 3.5. Unifying manual schedule / AutoTVM / Ansor
+
+In this section, we show that the design space induced by TE manual schedule, AutoTVM and Ansor are all subsets of meta schedule, and meta schedule further allows mixing those three styles to search jointly.
+
+**Manual schedule**. The TE schedule is a special case of a meta schedule program, where there is no randomness introduced by sampling instructions. It is a single point in terms of design space.
+
+**AutoTVM (Template-based tuning)**. It is more natural representation of AutoTVM’s schedule templates (knobs) by writing one or more schedule functions in meta schedule with sampling instructions. The execution traces generated by the schedule functions are the design space to be explored.
+
+**AutoScheduler (Ansor, Template-free tuning)**. As mentioned in the previous section, application of composite schedule rules generates the design space, which is equivalent to Ansor’s sketch generation.
+
+**Mixing styles in design space definition**. By taking union of the spaces induced by the three special cases, our system allows developers to combine generic rules that Ansor provides and operator-specific scheduling.
+
+## 4. Reference-level explanation
+
+In this section, we introduce the underlying techniques for the automation system to extract and explore the design space. The figure below briefly illustrates the workflow of the system:
+
+![meta-schedule-workflow](../resources/meta-schedule-workflow.png)
+
+### 4.1. Execution trace as the design space
+
+**Trace**. To represent the design space defined by the meta schedule DSL, the underlying system records all the instructions users applied to the schedule class, including sampling and schedule primitives. We call this list of instructions a trace.
+
+Executing the example above results in the following trace:
+
+```
+Instruction 0. Sample-Perfect-Tile
+Instruction 1. Sample-Perfect-Tile
+Instruction 2. Sample-Perfect-Tile
+Instruction 3. Split
+Instruction 4. Split
+Instruction 5. Split
+Instruction 6. Reorder
+```
+
+**Trace forms design space.** A trace may contain zero or more sampling instructions, which introduces the uncertainty in scheduling - one instance of sampling results in one point in the design space. Therefore, the trace itself forms a design space to explore, e.g. which set of tile sizes works best on a specific hardware.
+
+**Union of design space**. Our system works on a set of traces, representing the union of the design spaces represented by every single trace.
+
+**Fork a trace**. When two different decisions in the scheduling process are equally important to generate high-performance schedules, we allow forking the trace into two, and the design space is the union of the forked traces.
+
+The trace is not strictly user-facing, but can be accessed and printed with the following syntax:
+
+```python
+# requires to trace the execution
+sch = tir.Schedule(..., traced=True)
+# do a lot of scheduling
+...
+# print the trace
+print(sch.trace)
+```
+
+And below is an example of the printed trace, which honestly reflects the schedule as a snippet of python scheduling function:
+
+```python
+b0 = sch.get_block(name="matmul", func_name="main")
+l1, l2, l3 = sch.get_loops(block=b0)
+v4, v5, v6, v7 = sch.sample_perfect_tile(loop=l1, n=4, max_innermost_factor=16, decision=[32, 1, 16, 2])
+v8, v9, v10, v11 = sch.sample_perfect_tile(loop=l2, n=4, max_innermost_factor=16, decision=[64, 4, 2, 2])
+v12, v13 = sch.sample_perfect_tile(loop=l3, n=2, max_innermost_factor=16, decision=[64, 16])
+l14, l15, l16, l17 = sch.split(loop=l1, factors=[v4, v5, v6, v7])
+l18, l19, l20, l21 = sch.split(loop=l2, factors=[v8, v9, v10, v11])
+l22, l23 = sch.split(loop=l3, factors=[v12, v13])
+sch.reorder(l14, l18, l15, l19, l22, l16, l20, l23, l17, l21)
+```
+
+### 4.2. Exploring the Search Space
+
+Meta Schedule provides several built-in exploration strategies to exhaustively or efficiently search for efficient schedules.
+
+**Program replay**. A simple strategy that replays the user-provided schedule function without using or taking any advantage of trace.
+
+**Random search**. Extracts the traces as the design space from any design space generator (e.g. user-provided schedule function, composite schedule rules applied to each block, or any custom space generator), repetitively mutates the random decisions of a random trace and re-executes the traces.
+
+**Cost-model-guided evolutionary search**. A more efficient exploration strategy. We define two sets of rules:
+
+* Mutator: defines how to jump to a point’s "neighbor" in the design space
+* Postprocessor: after the trace is executed, there are some extra rules we want to execute, for example:
+  * Check CUDA resource limits: There is a hard requirement in CUDA that the maximum number of threads should not exceed 1024, but it is a random variable that cannot be determined before actually executing the trace. In this case, we write a postprocessor that errors out when the condition is not satisfied.
+  * Fuse outer loops until the extent of the fused loops is large enough: The number of outer loops to be fused together depends on their extents, which are random variables. In this case, we annotate the maximum extent allowed on the block, and do actual fusion in a postprocessor.
+
+Our evolutionary search algorithm uses mutators to find possible schedules in the design space, then applies postprocessors, and asks the cost model to predict its performance. After several iterations, the new schedules with the highest scores are finally compiled and measured on device. Epsilon-greedy is used in this process to balance exploitation and exploration.
+
+### 4.3. Python first for flexibility & customizability
+
+We implement the system in a way that all levels are decoupled and open to customization, aiming at providing a playground for developers to try out new ideas and potentially deliver performance quickly.
+
+While all the important APIs are implemented in C++ for efficiency, every part of the system can be easily switched to customized python implementation. For example,
+
+**Customize design space in python**. Can be a python function that does the schedule
+
+```python
+def schedule_matmul(sch) -> sch:
+    i, j, k = sch.get_loops(sch.get_block("matmul"))
+    i_tiles = sch.sample_perfect_tile(i, n=4)
+    j_tiles = sch.sample_perfect_tile(j, n=4)
+    k_tiles = sch.sample_perfect_tile(k, n=2)
+    # Tile the loops according to the random variables
+    i_0, i_1, i_2, i_3 = sch.split(loop=i, factors=i_tiles)
+    j_0, j_1, j_2, j_3 = sch.split(loop=j, factors=j_tiles)
+    k_0, k_1 = sch.split(loop=k, factors=k_tiles)
+    # Organize the loops into "SSRSRS" 6-level tiles
+    sch.reorder(
+        i_0, j_0, # S
+        i_1, j_1, # S
+        k_0,      # R
+        i_2, j_2, # S
+        k_1,      # R
+        i_3, j_3, # S
+    )
+    return sch
+```
+
+**Customize composite schedule in python**. We provide two ways to define a composite schedule in python:
+
+Method 1. Derive from `PyCompositeSchedule`, and implement two methods `initialize` and `apply`:
+
+```python
+class MultiLevelTiling(PyCompositeSchedule):
+    def initialize(...):
+        # initialize the class, usually this method is empty
+        ...
+
+    def apply(sch: Schedule, block: BlockRV) -> Union[Schedule, List[Schedule]]:
+        # write any python code, including:
+        # - analyze `block`
+        # - invoke schedule primitives in `sch`
+        # - do debug printing
+        ...
+```
+
+Method 2. A decorator as the syntactic sugar if the `initialize` method is empty, which converts the function to the `apply` method.
+
+```python
+@tir.as_composite_schedule(name="multi-level-tiling")
+def multi_level_tiling(sch: Schedule, block: BlockRV) -> Union[Schedule, List[Schedule]]:
+    # write any python code, including:
+    # - analyze `block`
+    # - invoke schedule primitives in `sch`
+    # - do debug printing
+    ...
+```
+
+**Customize exploration strategies in python**. Developers can implement any search algorithm in python as well by deriving from `PySearchPolicy`, and the syntax is identical to customizing with `PyCompositeSchedule`.
+
+**Other customizable components**. This list includes:
+
+* Cost model
+* Database
+* Measure callbacks
+* Feature extractor
+* Program builder & runner
+* Analysis methods
+* ...
+
+In a short summary, almost every component of the system is decoupled with each other and extensions could be easily plugged in.
+
+## 5. Drawbacks
+
+We are not aware of any drawbacks of the proposed system.

Review comment:
       what about this:
   
   ```markdown
   **Random search by replaying schedule functions.** With a user-provided schedule function
   as a black-box design space generator, our system could repetitively invoke such an opaque function
   without doing any extra analysis. The function could be written in C++ or Python, or any language
   that implements packed function FFI. If sampling instructions are present in the function, then each
   invocation results in a different IRModule after being scheduled. Effectively, it is equivalent to
   random search without trace, allowing the flexibility for users to define arbitrary functions that
   trace may not well support (e.g. control flow divergence based on the value of intermediate random
   variables), but it forbids future opportunity of any trace-based analysis.
   
   **Random search by replaying traces.** From a design space generator, our system obtains the
   traces as the search space, and then those traces are replayed repetitively with a builtin interpreter.
   If sampling instructions are present on the traces, each replay explores in a random point in the
   design space of schedules.
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tvm-rfcs] junrushao1994 commented on a change in pull request #5: [RFC] Meta Schedule (AutoTIR)

Posted by GitBox <gi...@apache.org>.
junrushao1994 commented on a change in pull request #5:
URL: https://github.com/apache/tvm-rfcs/pull/5#discussion_r676879435



##########
File path: rfcs/0001-meta-schedule-autotensorir.md
##########
@@ -266,55 +322,83 @@ sch.reorder(l14, l18, l15, l19, l22, l16, l20, l23, l17, l21)
 ### 4.2. Exploring the Design Space
 
 Meta Schedule provides several built-in exploration strategies to exhaustively or efficiently search
-for efficient schedules.
-
-**Random search by replaying schedule functions.** With a user-provided schedule function
-as a black-box design space generator, our system could repetitively invoke such an opaque function
-without doing any extra analysis. The function could be written in C++ or Python, or any language
-that implements packed function FFI. If sampling instructions are present in the function, then each
-invocation results in a different IRModule after being scheduled because the random decisions are
-possibly changed across different runs. Effectively, it is equivalent to
-random exploration without trace, allowing the flexibility for users to define arbitrary functions
+for efficient schedules. Those strategies are mostly supported by re-execute either a function or a
+trace with a builtin interpreter in meta schedule, and this process is called **replay**.
+
+#### Random search by replaying schedule functions
+
+With a user-provided schedule function
+as a black-box design space generator, meta schedule could repetitively invokes such an opaque TVM
+packed function without doing any extra analysis.
+If sampling instructions are present in the trace, then scheduling is non-deterministic
+(random decisions may not be repeated across runs)

Review comment:
       Yeah I should clarify more straightforward: a trace with decisions strictly guarantees reproducibility.
   
   In detail, a trace is defined as:
   
   ```python
   class Trace:
     instructions: List[Instruction]
     decisions: Dict[Instruction, Any]
   ```
   
   For each sampling instruction in the trace, if it has a corresponding entry in the `decisions` dict, then the output is uniquely determined by the decision, hence reproducible (example 1); If a corresponding entry is not presented, then randomness will be introduced by interpreting the trace (example 2).
   
   
   ```python
   # Example 1. Trace with deterministic result
   l1, l2 = sch.sample_perfect_tile(loop, n=2, decisions=[4, 32])  # Deterministic l1 = 4, l2 = 32
   # Example 2. Trace with randomized result
   l1, l2 = sch.sample_perfect_tile(loop, n=2)  # l1 and l2 are random
   ```
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tvm-rfcs] junrushao1994 commented on a change in pull request #5: [RFC] Meta Schedule (AutoTIR)

Posted by GitBox <gi...@apache.org>.
junrushao1994 commented on a change in pull request #5:
URL: https://github.com/apache/tvm-rfcs/pull/5#discussion_r668390088



##########
File path: rfcs/0001-meta-schedule-autotensorir.md
##########
@@ -0,0 +1,444 @@
+<!--- Licensed to the Apache Software Foundation (ASF) under one -->
+<!--- or more contributor license agreements.  See the NOTICE file -->
+<!--- distributed with this work for additional information -->
+<!--- regarding copyright ownership.  The ASF licenses this file -->
+<!--- to you under the Apache License, Version 2.0 (the -->
+<!--- "License"); you may not use this file except in compliance -->
+<!--- with the License.  You may obtain a copy of the License at -->
+
+<!---   http://www.apache.org/licenses/LICENSE-2.0 -->
+
+<!--- Unless required by applicable law or agreed to in writing, -->
+<!--- software distributed under the License is distributed on an -->
+<!--- "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY -->
+<!--- KIND, either express or implied.  See the License for the -->
+<!--- specific language governing permissions and limitations -->
+<!--- under the License. -->
+
+* Feature Name: Meta Schedule (Formerly AutoTIR)
+* Start Date: 2021-05-28
+* RFC PR: TBD (apache/tvm-rfcs#0000)
+* GitHub Issue: TBD (apache/tvm-rfcs#0000)
+
+## 1. Summary
+
+This proposal introduces Meta Schedule: a probabilistic scheduling DSL on TIR that unifies the
+approaches of AutoTVM and Auto Scheduler (Ansor). Meta schedule provides a pragmatic way to define
+the space of automatic tuning, extensibility in terms of all possible TIR schedule primitives like
+tensorization and loop partitioning, and customizability on every layer of the automation system.
+
+Meta Schedule is our 3rd generation automatic scheduling system.
+
+## 2. Motivation
+
+**Scheduling and design space.** In TVM TensorIR, optimization of a TensorIR program is done via a
+sequence of transformations. For example, we reorder loops for better locality and we tensorize for
+specific hardware intrinsics. The process of invoking such a set of pre-defined transformations is
+called "**scheduling**", and each transformation is called a "**schedule primitive**". These
+primitives form a domain-specific language (DSL) describing the transformation of TensorIR programs.
+**Design space** is the set of all possible schedulings with respect to a TensorIR program.
+
+**Problems with the current scheduling system.** Currently we have 3 sets of scheduling APIs:
+* **Manual schedule**: Developers optimize their programs by manually invoking schedule primitives,
+  i.e. explore points in the design space with humans in the loop. This can be a tedious and
+  error-prone approach, hence the creation of AutoTVM and AutoScheduler (Ansor).
+* **AutoTVM**: The automation system requires users to define "schedule templates" as the design
+  space for each operator. Therefore, it is inextensible to hundreds of operators and variety
+  hardware platforms.
+* **AutoScheduler (Ansor)**: It automatically generates schedule templates as the design space,
+  according to a set of predefined "search rules". However, it is non-trivial to extend
+  AutoScheduler to new schedule primitives (tensorize, loop partition, software pipelining, etc).
+* The three systems above have isolated sets of APIs with several layers of their own abstraction,
+  which are not only hard to learn, but also engineering-intensive to customize.
+
+**Benefits of Meta Schedule.**  Meta schedule provides:
+* Succinct syntax, consistent APIs to TensorIR schedule with no other layer of abstraction.
+* Unified APIs for implementing manual schedules, AutoTVM-style schedules, and AutoScheduler-style
+  schedules.
+* Extensibility of all the schedule primitives, including tensorization and loop partitioning.
+  Almost no extra effort is needed to use a new primitive in auto-tuning.
+* The automation infrastructure is extensible on every of its components. Every component of the
+  system can be customized easily in pure python or C++ or both. For example, one can develop a new
+  design space generator in python, a new ProgramRunner in python, etc.
+
+
+## 3. Guide-level explanation
+
+In this section, we describe the syntax of meta schedule DSL, and how it could be used to describe
+and auto-generate the design space.
+
+### 3.1. Manual Schedule
+
+Meta schedule APIs are almost the same as TE or TensorIR scheduling. Here is an example of a manual
+schedule for matrix multiplication:
+
+```python
+# Designate a set of tile sizes
+i_tiles = [16, 8, 8, 8]
+j_tiles = [16, 8, 8, 8]
+k_tiles = [256, 8]
+
+# Tile the loops according to the tile sizes
+i_0, i_1, i_2, i_3 = sch.split(loop=i, factors=i_tiles)
+j_0, j_1, j_2, j_3 = sch.split(loop=j, factors=j_tiles)
+k_0, k_1           = sch.split(loop=k, factors=k_tiles)
+
+# Organize the loops into "SSRSRS" 6-level tiles
+sch.reorder(
+    i_0, j_0, # S: the 1st spatial tile
+    i_1, j_1, # S: the 2nd spatial tile
+    k_0,      # R: the 1st reduction tile
+    i_2, j_2, # S: the 3rd spatial tile
+    k_1,      # R: the 2nd reduction tile
+    i_3, j_3, # S: the 4th spatial tile
+)
+```
+
+In this example, the developers may tweak the tile sizes and measure the performance of the
+generated kernels to explore the opportunities of potential optimization.
+
+Generally speaking, while writing a schedule, there are often some parameters that are hard to
+determine ahead of time, for example, tile sizes, unroll steps, or which tensor intrinsics to use.
+Developers may manually enumerate possible combinations of these unknown factors, and then pick the
+best schedule according to measurement results on their device.
+
+### 3.2. Composite Schedule Rules
+
+As introduced in the previous section, in TensorIR, each schedule primitive handles only a very
+basic transformation of the IR. For example, `split` only splits a loop into two new loops. In the
+real world, the over-fine granularity of those primitives usually leads to repetitive and verbose
+scheduling code, as
+[mentioned](https://discuss.tvm.apache.org/t/rfc-tensorir-a-schedulable-ir-for-tvm/7872/43?u=junrushao1994)
+by developers in our community.
+
+To make it more convenient and modular, we allow users to register "composite schedules" that apply
+a sequence of schedule primitives according to certain analysis of the IR. For instance, a composite
+schedule may inspect a TensorIR block and decide whether we should call `compute_inline` on it.
+
+### 3.3. AutoTVM-style Design Space Description
+
+Meta schedule extends the schedule DSL with sampling instructions. When included in a schedule,
+these instructions parametrize the schedule from a single deterministic point to a space supported
+by random variables (tile size, etc.), making it possible for developers to describe the design
+space with meta schedule APIs.
+
+We can extend the matmul example above to cover all possible tilings using these sampling
+instructions:
+
+```python
+# Sample tile sizes
+i_tiles = sch.sample_perfect_tile(i, n=4)
+j_tiles = sch.sample_perfect_tile(j, n=4)
+k_tiles = sch.sample_perfect_tile(k, n=2)
+# Tile the loops according to the random variables
+i_0, i_1, i_2, i_3 = sch.split(loop=i, factors=i_tiles)
+j_0, j_1, j_2, j_3 = sch.split(loop=j, factors=j_tiles)
+k_0, k_1           = sch.split(loop=k, factors=k_tiles)
+# Organize the loops into "SSRSRS" 6-level tiles
+sch.reorder(
+    i_0, j_0, # S: the 1st spatial tile
+    i_1, j_1, # S: the 2nd spatial tile
+    k_0,      # R: the 1st reduction tile
+    i_2, j_2, # S: the 3rd spatial tile
+    k_1,      # R: the 2nd reduction tile
+    i_3, j_3, # S: the 4th spatial tile
+)
+```
+
+### 3.4. AutoScheduler-style Design Space Generation
+
+AutoScheduler (Ansor) generates schedule templates by applying their SearchRules to each stage.
+SearchRule analyzes TE and eagerly trigger schedule primitives accordingly in its internally
+maintained mini IR.
+
+As introduced in Section 3.2, composite schedule rules are equivalent to AutoScheduler's SearchRule
+in TensorIR scheduling. To further generate a design space for scheduling, sampling instructions are
+used in composite schedule rules. Similarly, the sketch generation phase in AutoScheduler is
+equivalent to applying composite schedule rules to each block in TensorIR.
+
+Several built-in composite schedule rules are shipped with our system to align with Ansor's design
+space:
+
+* Multi-level tiling
+* Inline pure spatial blocks
+* Parallelize & vectorize & unroll
+* Auto tensorize
+
+Developers may implement their own rules in either Python or C++. They may specify which rules to
+use with the following syntax:
+
+```python
+from tvm import meta_schedule as ms
+
+design_space_generator = ms.PostOrderApply(rules=[
+    ms.MultiLevelTiling(...),
+    CustomRule(...),
+    ms.OtherBuiltinRules(...),
+])
+
+```
+
+### 3.5. Unifying manual schedule / AutoTVM / Ansor
+
+In this section, we show that the design space induced by TE manual schedule, AutoTVM and Ansor are
+all subsets of meta schedule, and meta schedule further allows mixing those three styles to search
+jointly.
+
+**Manual schedule**. The TE schedule is a special case of a meta schedule program, where there is no
+randomness introduced by sampling instructions. It is a single point in terms of design space.
+
+**AutoTVM (Template-based tuning)**. It is more natural representation of AutoTVM’s schedule
+templates (knobs) by writing one or more schedule functions in meta schedule with sampling
+instructions. The execution traces generated by the schedule functions are the design space to be
+explored.
+
+**AutoScheduler (Ansor, Template-free tuning)**. As mentioned in the previous section, application
+of composite schedule rules generates the design space, which is equivalent to Ansor’s sketch
+generation.
+
+**Mixing styles in design space definition**. By taking union of the spaces induced by the three
+special cases, our system allows developers to combine generic rules that Ansor provides and
+operator-specific scheduling.
+
+## 4. Reference-level explanation
+
+In this section, we introduce the underlying techniques for the automation system to extract and
+explore the design space. The figure below briefly illustrates the workflow of the system:
+
+![meta-schedule-workflow](../resources/meta-schedule-workflow.png)
+
+### 4.1. Execution trace as the design space
+
+**Trace**. To represent the design space defined by the meta schedule DSL, the underlying system
+records all the instructions users applied to the schedule class, including sampling and schedule
+primitives. We call this list of instructions a trace.
+
+For instance, executing the example above results in the following trace:
+
+```
+Instruction 0. Sample-Perfect-Tile
+Instruction 1. Sample-Perfect-Tile
+Instruction 2. Sample-Perfect-Tile
+Instruction 3. Split
+Instruction 4. Split
+Instruction 5. Split
+Instruction 6. Reorder
+```
+
+**Trace forms design space.** A trace may contain zero or more sampling instructions, which
+introduces the uncertainty in scheduling - one instance of sampling results in one point in the
+design space. Therefore, the trace itself forms a design space to explore, e.g. which set of tile
+sizes works best on a specific hardware.
+
+**Union of design space**. Our system works on a set of traces, representing the union of the design
+spaces represented by every single trace.
+
+**Fork a trace**. When two different decisions in the scheduling process are equally important to
+generate high-performance schedules, we allow forking the trace into two, and the design space is
+the union of the forked traces.
+
+The trace is not strictly user-facing, but can be accessed and printed with the following syntax:
+
+```python
+# requires to trace the execution
+sch = tir.Schedule(..., traced=True)
+# do a lot of scheduling
+...
+# print the trace
+print(sch.trace)
+```
+
+And below is an example of the printed trace, which honestly reflects the schedule as a snippet of
+python scheduling function:
+
+```python
+b0 = sch.get_block(name="matmul", func_name="main")
+l1, l2, l3 = sch.get_loops(block=b0)
+v4, v5, v6, v7 = sch.sample_perfect_tile(loop=l1, n=4, max_innermost_factor=16, decision=[32, 1, 16, 2])
+v8, v9, v10, v11 = sch.sample_perfect_tile(loop=l2, n=4, max_innermost_factor=16, decision=[64, 4, 2, 2])
+v12, v13 = sch.sample_perfect_tile(loop=l3, n=2, max_innermost_factor=16, decision=[64, 16])
+l14, l15, l16, l17 = sch.split(loop=l1, factors=[v4, v5, v6, v7])
+l18, l19, l20, l21 = sch.split(loop=l2, factors=[v8, v9, v10, v11])
+l22, l23 = sch.split(loop=l3, factors=[v12, v13])
+sch.reorder(l14, l18, l15, l19, l22, l16, l20, l23, l17, l21)
+```
+
+### 4.2. Exploring the Design Space
+
+Meta Schedule provides several built-in exploration strategies to exhaustively or efficiently search
+for efficient schedules.
+
+**Random search by replaying schedule functions.** With a user-provided schedule function
+as a black-box design space generator, our system could repetitively invoke such an opaque function
+without doing any extra analysis. The function could be written in C++ or Python, or any language
+that implements packed function FFI. If sampling instructions are present in the function, then each

Review comment:
       Haha it is intentional here to say "function" instead of "trace". For example, we may replay the following function with python control flow and other interesting statements:
   
   ```python
   def sch_fn(sch: tir.Schedule):
      ...
      if some_condition:  # python control flow
         # schedule strategy A
      else:
         # schedule strategy B
      ...
      print(tvm.script.asscript(sch.mod))  # other statements
   ```
   
   The function replaying is designed particularly to counter the needs of such functions. Otherwise user could use "Random search by replaying traces" in the next subsection
   
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tvm-rfcs] junrushao1994 commented on a change in pull request #5: [RFC] Meta Schedule (AutoTIR)

Posted by GitBox <gi...@apache.org>.
junrushao1994 commented on a change in pull request #5:
URL: https://github.com/apache/tvm-rfcs/pull/5#discussion_r668326423



##########
File path: rfcs/0001-meta-schedule-autotensorir.md
##########
@@ -0,0 +1,444 @@
+<!--- Licensed to the Apache Software Foundation (ASF) under one -->
+<!--- or more contributor license agreements.  See the NOTICE file -->
+<!--- distributed with this work for additional information -->
+<!--- regarding copyright ownership.  The ASF licenses this file -->
+<!--- to you under the Apache License, Version 2.0 (the -->
+<!--- "License"); you may not use this file except in compliance -->
+<!--- with the License.  You may obtain a copy of the License at -->
+
+<!---   http://www.apache.org/licenses/LICENSE-2.0 -->
+
+<!--- Unless required by applicable law or agreed to in writing, -->
+<!--- software distributed under the License is distributed on an -->
+<!--- "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY -->
+<!--- KIND, either express or implied.  See the License for the -->
+<!--- specific language governing permissions and limitations -->
+<!--- under the License. -->
+
+* Feature Name: Meta Schedule (Formerly AutoTIR)
+* Start Date: 2021-05-28
+* RFC PR: TBD (apache/tvm-rfcs#0000)
+* GitHub Issue: TBD (apache/tvm-rfcs#0000)
+
+## 1. Summary
+
+This proposal introduces Meta Schedule: a probabilistic scheduling DSL on TIR that unifies the
+approaches of AutoTVM and Auto Scheduler (Ansor). Meta schedule provides a pragmatic way to define
+the space of automatic tuning, extensibility in terms of all possible TIR schedule primitives like
+tensorization and loop partitioning, and customizability on every layer of the automation system.
+
+Meta Schedule is our 3rd generation automatic scheduling system.
+
+## 2. Motivation
+
+**Scheduling and design space.** In TVM TensorIR, optimization of a TensorIR program is done via a
+sequence of transformations. For example, we reorder loops for better locality and we tensorize for
+specific hardware intrinsics. The process of invoking such a set of pre-defined transformations is
+called "**scheduling**", and each transformation is called a "**schedule primitive**". These
+primitives form a domain-specific language (DSL) describing the transformation of TensorIR programs.
+**Design space** is the set of all possible schedulings with respect to a TensorIR program.
+
+**Problems with the current scheduling system.** Currently we have 3 sets of scheduling APIs:
+* **Manual schedule**: Developers optimize their programs by manually invoking schedule primitives,
+  i.e. explore points in the design space with humans in the loop. This can be a tedious and
+  error-prone approach, hence the creation of AutoTVM and AutoScheduler (Ansor).
+* **AutoTVM**: The automation system requires users to define "schedule templates" as the design
+  space for each operator. Therefore, it is inextensible to hundreds of operators and variety
+  hardware platforms.
+* **AutoScheduler (Ansor)**: It automatically generates schedule templates as the design space,
+  according to a set of predefined "search rules". However, it is non-trivial to extend
+  AutoScheduler to new schedule primitives (tensorize, loop partition, software pipelining, etc).
+* The three systems above have isolated sets of APIs with several layers of their own abstraction,
+  which are not only hard to learn, but also engineering-intensive to customize.
+
+**Benefits of Meta Schedule.**  Meta schedule provides:
+* Succinct syntax, consistent APIs to TensorIR schedule with no other layer of abstraction.
+* Unified APIs for implementing manual schedules, AutoTVM-style schedules, and AutoScheduler-style
+  schedules.
+* Extensibility of all the schedule primitives, including tensorization and loop partitioning.
+  Almost no extra effort is needed to use a new primitive in auto-tuning.
+* The automation infrastructure is extensible on every of its components. Every component of the
+  system can be customized easily in pure python or C++ or both. For example, one can develop a new
+  design space generator in python, a new ProgramRunner in python, etc.
+
+
+## 3. Guide-level explanation
+
+In this section, we describe the syntax of meta schedule DSL, and how it could be used to describe
+and auto-generate the design space.
+
+### 3.1. Manual Schedule
+
+Meta schedule APIs are almost the same as TE or TensorIR scheduling. Here is an example of a manual
+schedule for matrix multiplication:
+
+```python
+# Designate a set of tile sizes
+i_tiles = [16, 8, 8, 8]
+j_tiles = [16, 8, 8, 8]
+k_tiles = [256, 8]
+
+# Tile the loops according to the tile sizes
+i_0, i_1, i_2, i_3 = sch.split(loop=i, factors=i_tiles)
+j_0, j_1, j_2, j_3 = sch.split(loop=j, factors=j_tiles)
+k_0, k_1           = sch.split(loop=k, factors=k_tiles)
+
+# Organize the loops into "SSRSRS" 6-level tiles
+sch.reorder(
+    i_0, j_0, # S: the 1st spatial tile
+    i_1, j_1, # S: the 2nd spatial tile
+    k_0,      # R: the 1st reduction tile
+    i_2, j_2, # S: the 3rd spatial tile
+    k_1,      # R: the 2nd reduction tile
+    i_3, j_3, # S: the 4th spatial tile
+)
+```
+
+In this example, the developers may tweak the tile sizes and measure the performance of the
+generated kernels to explore the opportunities of potential optimization.
+
+Generally speaking, while writing a schedule, there are often some parameters that are hard to
+determine ahead of time, for example, tile sizes, unroll steps, or which tensor intrinsics to use.
+Developers may manually enumerate possible combinations of these unknown factors, and then pick the
+best schedule according to measurement results on their device.
+
+### 3.2. Composite Schedule Rules
+
+As introduced in the previous section, in TensorIR, each schedule primitive handles only a very
+basic transformation of the IR. For example, `split` only splits a loop into two new loops. In the
+real world, the over-fine granularity of those primitives usually leads to repetitive and verbose
+scheduling code, as
+[mentioned](https://discuss.tvm.apache.org/t/rfc-tensorir-a-schedulable-ir-for-tvm/7872/43?u=junrushao1994)
+by developers in our community.
+
+To make it more convenient and modular, we allow users to register "composite schedules" that apply
+a sequence of schedule primitives according to certain analysis of the IR. For instance, a composite
+schedule may inspect a TensorIR block and decide whether we should call `compute_inline` on it.
+
+### 3.3. AutoTVM-style Design Space Description
+
+Meta schedule extends the schedule DSL with sampling instructions. When included in a schedule,
+these instructions parametrize the schedule from a single deterministic point to a space supported
+by random variables (tile size, etc.), making it possible for developers to describe the design
+space with meta schedule APIs.
+
+We can extend the matmul example above to cover all possible tilings using these sampling
+instructions:
+
+```python
+# Sample tile sizes
+i_tiles = sch.sample_perfect_tile(i, n=4)
+j_tiles = sch.sample_perfect_tile(j, n=4)
+k_tiles = sch.sample_perfect_tile(k, n=2)
+# Tile the loops according to the random variables
+i_0, i_1, i_2, i_3 = sch.split(loop=i, factors=i_tiles)
+j_0, j_1, j_2, j_3 = sch.split(loop=j, factors=j_tiles)
+k_0, k_1           = sch.split(loop=k, factors=k_tiles)
+# Organize the loops into "SSRSRS" 6-level tiles
+sch.reorder(
+    i_0, j_0, # S: the 1st spatial tile
+    i_1, j_1, # S: the 2nd spatial tile
+    k_0,      # R: the 1st reduction tile
+    i_2, j_2, # S: the 3rd spatial tile
+    k_1,      # R: the 2nd reduction tile
+    i_3, j_3, # S: the 4th spatial tile
+)
+```
+
+### 3.4. AutoScheduler-style Design Space Generation
+
+AutoScheduler (Ansor) generates schedule templates by applying their SearchRules to each stage.
+SearchRule analyzes TE and eagerly trigger schedule primitives accordingly in its internally
+maintained mini IR.
+
+As introduced in Section 3.2, composite schedule rules are equivalent to AutoScheduler's SearchRule
+in TensorIR scheduling. To further generate a design space for scheduling, sampling instructions are
+used in composite schedule rules. Similarly, the sketch generation phase in AutoScheduler is
+equivalent to applying composite schedule rules to each block in TensorIR.
+
+Several built-in composite schedule rules are shipped with our system to align with Ansor's design

Review comment:
       Composite schedule rule. Ansor's SearchRule is not compatible with our system, because they are designed to work on its own internal DSL and IR




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tvm-rfcs] junrushao1994 commented on a change in pull request #5: [RFC] Meta Schedule (AutoTIR)

Posted by GitBox <gi...@apache.org>.
junrushao1994 commented on a change in pull request #5:
URL: https://github.com/apache/tvm-rfcs/pull/5#discussion_r661928675



##########
File path: rfcs/0001-meta-schedule-autotensorir.md
##########
@@ -0,0 +1,367 @@
+<!--- Licensed to the Apache Software Foundation (ASF) under one -->
+<!--- or more contributor license agreements.  See the NOTICE file -->
+<!--- distributed with this work for additional information -->
+<!--- regarding copyright ownership.  The ASF licenses this file -->
+<!--- to you under the Apache License, Version 2.0 (the -->
+<!--- "License"); you may not use this file except in compliance -->
+<!--- with the License.  You may obtain a copy of the License at -->
+
+<!---   http://www.apache.org/licenses/LICENSE-2.0 -->
+
+<!--- Unless required by applicable law or agreed to in writing, -->
+<!--- software distributed under the License is distributed on an -->
+<!--- "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY -->
+<!--- KIND, either express or implied.  See the License for the -->
+<!--- specific language governing permissions and limitations -->
+<!--- under the License. -->
+
+* Feature Name: Meta Schedule (AutoTIR)
+* Start Date: 2021-05-28
+* RFC PR: TBD (apache/tvm-rfcs#0000)
+* GitHub Issue: TBD (apache/tvm-rfcs#0000)
+
+## 1. Summary
+
+This proposal introduces Meta Schedule: a probabilistic scheduling DSL on TIR that unifies the approaches of AutoTVM and Auto Scheduler (Ansor). Meta schedule provides a pragmatic way to define the space of automatic tuning, extensibility in terms of all possible TIR schedule primitives like tensorization and loop partitioning, and customizability on every layer of the automation system.
+
+Meta Schedule is our 3rd generation automatic scheduling system.
+
+## 2. Motivation
+
+**Scheduling and Design Space**
+
+In TVM TensorIR, optimization of a TensorIR program is done via a sequence of transformations. For example, we reorder loops for better locality and we tensorize for specific hardware intrinsics. The process of invoking such a set of pre-defined transformations is called "**scheduling**", and each transformation is called a "**schedule primitive**". These primitives form a domain-specific language (DSL) describing the transformation of TensorIR programs. **Design space** is the set of all possible schedulings with respect to a TensorIR program.
+
+**Problems with the Current Scheduling System**
+
+* **Manual schedule**: Developers optimize their programs by manually invoking schedule primitives, i.e. explore points in the design space with humans in the loop. This can be a tedious and error-prone approach, hence the creation of AutoTVM and AutoScheduler (Ansor).
+* **AutoTVM**: The automation system requires users to define "schedule templates" as the design space for each operator. Therefore, it is inextensible to hundreds of operators and variety hardware platforms.
+* **AutoScheduler (Ansor)**: It automatically generates schedule templates as the design space, according to a set of predefined "search rules". However, it is non-trivial to extend AutoScheduler to new schedule primitives (tensorize, loop partition, software pipelining, etc).
+* The three systems above have isolated sets of APIs with several layers of their own abstraction, which are not only hard to learn, but also engineering-intensive to customize.
+
+**Benefit of Meta Schedule**
+
+* Succinct syntax, consistent APIs to TensorIR schedule with no other layer of abstraction.
+* Provides unified APIs for implementing manual schedule, AutoTVM and AutoScheduler (Ansor).
+* Extensibility to all the schedule primitives, including tensorization and loop partitioning. Almost no extra effort is needed to use a new primitive in auto-tuning.
+* The automation infrastructure is extensible on every of its components. Every component of the system can be customized easily in pure python or C++ or both. For example, one can develop a new design space generator in python, a new ProgramRunner in python, etc.
+
+
+## 3. Guide-level explanation
+
+In this section, we describe the syntax of meta schedule DSL, and how it could be used to describe and auto-generate the design space.
+
+### 3.1. Manual Schedule
+
+Meta schedule APIs are almost the same as TE or TensorIR scheduling. Here is an example of a manual schedule for matrix multiplication:
+
+```python
+# Designate a set of tile sizes
+i_tiles = [16, 8, 8, 8]
+j_tiles = [16, 8, 8, 8]
+k_tiles = [256, 8]
+
+# Tile the loops according to the tile sizes
+i_0, i_1, i_2, i_3 = sch.split(loop=i, factors=i_tiles)
+j_0, j_1, j_2, j_3 = sch.split(loop=j, factors=j_tiles)
+k_0, k_1           = sch.split(loop=k, factors=k_tiles)
+
+# Organize the loops into "SSRSRS" 6-level tiles
+sch.reorder(
+    i_0, j_0, # S: the 1st spatial tile
+    i_1, j_1, # S: the 2nd spatial tile
+    k_0,      # R: the 1st reduction tile
+    i_2, j_2, # S: the 3rd spatial tile
+    k_1,      # R: the 2nd reduction tile
+    i_3, j_3, # S: the 4th spatial tile
+)
+```
+
+In this example, the developers may tweak the tile sizes and measure the performance of the generated kernels to explore the opportunities of potential optimization.
+
+Generally speaking, while writing a schedule, there are often some parameters that are hard to determine ahead of time, for example, tile sizes, unroll steps, or which tensor intrinsics to use. Developers may manually enumerate possible combinations of these unknown factors, and then pick the best schedule according to measurement results on their device.
+
+### 3.2. Composite Schedule Rules
+
+As introduced in the previous section, in TensorIR, each schedule primitive handles only a very basic transformation of the IR, for example, `split` only splits a loop into two. In the real world, the over-fine granularity of those primitives usually leads to repetitive and verbose scheduling code, as [mentioned](https://discuss.tvm.apache.org/t/rfc-tensorir-a-schedulable-ir-for-tvm/7872/43?u=junrushao1994) by developers in our community.
+
+To make it more convenient and modular, we allow users to register "composite schedules" that apply a sequence of schedule primitives according to certain analysis of the IR. For instance, a composite schedule may inspect a TensorIR block and decide whether we should call `compute_inline` on it.
+
+### 3.3. AutoTVM-style Design Space Description
+
+Meta schedule extends the schedule DSL with sampling instructions. When included in a schedule, these instructions parametrize the schedule from a single deterministic point to a space supported by random variables (tile size, etc.), making it possible for developers to describe the design space with meta schedule APIs.
+
+We can extend the matmul example above to cover all possible tilings using these sampling instructions:
+
+```python
+# Sample tile sizes
+i_tiles = sch.sample_perfect_tile(i, n=4)
+j_tiles = sch.sample_perfect_tile(j, n=4)
+k_tiles = sch.sample_perfect_tile(k, n=2)
+# Tile the loops according to the random variables
+i_0, i_1, i_2, i_3 = sch.split(loop=i, factors=i_tiles)
+j_0, j_1, j_2, j_3 = sch.split(loop=j, factors=j_tiles)
+k_0, k_1           = sch.split(loop=k, factors=k_tiles)
+# Organize the loops into "SSRSRS" 6-level tiles
+sch.reorder(
+    i_0, j_0, # S: the 1st spatial tile
+    i_1, j_1, # S: the 2nd spatial tile
+    k_0,      # R: the 1st reduction tile
+    i_2, j_2, # S: the 3rd spatial tile
+    k_1,      # R: the 2nd reduction tile
+    i_3, j_3, # S: the 4th spatial tile
+)
+```
+
+### 3.4. AutoScheduler-style Design Space Generation
+
+AutoScheduler (Ansor) generates schedule templates by applying their SearchRules to each stage. SearchRule analyzes TE and eagerly trigger schedule primitives accordingly in its internally maintained mini IR.
+
+As introduced in Section 3.2, composite schedule rules are equivalent to AutoScheduler's SearchRule in TensorIR scheduling. To further generate a design space for scheduling, sampling instructions are used in composite schedule rules. Similarly, the sketch generation phase in AutoScheduler is equivalent to applying composite schedule rules to each block in TensorIR.
+
+Several built-in composite schedule rules are shipped with our system to align with Ansor's design space:
+
+* Multi-level tiling
+* Inline pure spatial blocks
+* Parallelize & vectorize & unroll
+* Auto tensorize
+* ...
+
+Developers may implement their own rules in either Python or C++, and specify which rules to use with the following syntax:
+
+```python
+from tvm import meta_schedule as ms
+
+design_space_generator = ms.PostOrderApply(rules=[
+    ms.MultiLevelTiling(...),
+    CustomRule(...),
+    ms.OtherBuiltinRules(...),
+])
+
+```
+
+### 3.5. Unifying manual schedule / AutoTVM / Ansor
+
+In this section, we show that the design space induced by TE manual schedule, AutoTVM and Ansor are all subsets of meta schedule, and meta schedule further allows mixing those three styles to search jointly.
+
+**Manual schedule**. The TE schedule is a special case of a meta schedule program, where there is no randomness introduced by sampling instructions. It is a single point in terms of design space.
+
+**AutoTVM (Template-based tuning)**. It is more natural representation of AutoTVM’s schedule templates (knobs) by writing one or more schedule functions in meta schedule with sampling instructions. The execution traces generated by the schedule functions are the design space to be explored.
+
+**AutoScheduler (Ansor, Template-free tuning)**. As mentioned in the previous section, application of composite schedule rules generates the design space, which is equivalent to Ansor’s sketch generation.
+
+**Mixing styles in design space definition**. By taking union of the spaces induced by the three special cases, our system allows developers to combine generic rules that Ansor provides and operator-specific scheduling.
+
+## 4. Reference-level explanation
+
+In this section, we introduce the underlying techniques for the automation system to extract and explore the design space. The figure below briefly illustrates the workflow of the system:
+
+![meta-schedule-workflow](../resources/meta-schedule-workflow.png)
+
+### 4.1. Execution trace as the design space
+
+**Trace**. To represent the design space defined by the meta schedule DSL, the underlying system records all the instructions users applied to the schedule class, including sampling and schedule primitives. We call this list of instructions a trace.
+
+Executing the example above results in the following trace:
+
+```
+Instruction 0. Sample-Perfect-Tile
+Instruction 1. Sample-Perfect-Tile
+Instruction 2. Sample-Perfect-Tile
+Instruction 3. Split
+Instruction 4. Split
+Instruction 5. Split
+Instruction 6. Reorder
+```
+
+**Trace forms design space.** A trace may contain zero or more sampling instructions, which introduces the uncertainty in scheduling - one instance of sampling results in one point in the design space. Therefore, the trace itself forms a design space to explore, e.g. which set of tile sizes works best on a specific hardware.
+
+**Union of design space**. Our system works on a set of traces, representing the union of the design spaces represented by every single trace.
+
+**Fork a trace**. When two different decisions in the scheduling process are equally important to generate high-performance schedules, we allow forking the trace into two, and the design space is the union of the forked traces.
+
+The trace is not strictly user-facing, but can be accessed and printed with the following syntax:
+
+```python
+# requires to trace the execution
+sch = tir.Schedule(..., traced=True)
+# do a lot of scheduling
+...
+# print the trace
+print(sch.trace)
+```
+
+And below is an example of the printed trace, which honestly reflects the schedule as a snippet of python scheduling function:
+
+```python
+b0 = sch.get_block(name="matmul", func_name="main")
+l1, l2, l3 = sch.get_loops(block=b0)
+v4, v5, v6, v7 = sch.sample_perfect_tile(loop=l1, n=4, max_innermost_factor=16, decision=[32, 1, 16, 2])
+v8, v9, v10, v11 = sch.sample_perfect_tile(loop=l2, n=4, max_innermost_factor=16, decision=[64, 4, 2, 2])
+v12, v13 = sch.sample_perfect_tile(loop=l3, n=2, max_innermost_factor=16, decision=[64, 16])
+l14, l15, l16, l17 = sch.split(loop=l1, factors=[v4, v5, v6, v7])
+l18, l19, l20, l21 = sch.split(loop=l2, factors=[v8, v9, v10, v11])
+l22, l23 = sch.split(loop=l3, factors=[v12, v13])
+sch.reorder(l14, l18, l15, l19, l22, l16, l20, l23, l17, l21)
+```
+
+### 4.2. Exploring the Search Space
+
+Meta Schedule provides several built-in exploration strategies to exhaustively or efficiently search for efficient schedules.
+
+**Program replay**. A simple strategy that replays the user-provided schedule function without using or taking any advantage of trace.
+
+**Random search**. Extracts the traces as the design space from any design space generator (e.g. user-provided schedule function, composite schedule rules applied to each block, or any custom space generator), repetitively mutates the random decisions of a random trace and re-executes the traces.
+
+**Cost-model-guided evolutionary search**. A more efficient exploration strategy. We define two sets of rules:
+
+* Mutator: defines how to jump to a point’s "neighbor" in the design space
+* Postprocessor: after the trace is executed, there are some extra rules we want to execute, for example:
+  * Check CUDA resource limits: There is a hard requirement in CUDA that the maximum number of threads should not exceed 1024, but it is a random variable that cannot be determined before actually executing the trace. In this case, we write a postprocessor that errors out when the condition is not satisfied.
+  * Fuse outer loops until the extent of the fused loops is large enough: The number of outer loops to be fused together depends on their extents, which are random variables. In this case, we annotate the maximum extent allowed on the block, and do actual fusion in a postprocessor.
+
+Our evolutionary search algorithm uses mutators to find possible schedules in the design space, then applies postprocessors, and asks the cost model to predict its performance. After several iterations, the new schedules with the highest scores are finally compiled and measured on device. Epsilon-greedy is used in this process to balance exploitation and exploration.
+
+### 4.3. Python first for flexibility & customizability
+
+We implement the system in a way that all levels are decoupled and open to customization, aiming at providing a playground for developers to try out new ideas and potentially deliver performance quickly.
+
+While all the important APIs are implemented in C++ for efficiency, every part of the system can be easily switched to customized python implementation. For example,
+
+**Customize design space in python**. Can be a python function that does the schedule
+
+```python
+def schedule_matmul(sch) -> sch:
+    i, j, k = sch.get_loops(sch.get_block("matmul"))
+    i_tiles = sch.sample_perfect_tile(i, n=4)
+    j_tiles = sch.sample_perfect_tile(j, n=4)
+    k_tiles = sch.sample_perfect_tile(k, n=2)
+    # Tile the loops according to the random variables
+    i_0, i_1, i_2, i_3 = sch.split(loop=i, factors=i_tiles)
+    j_0, j_1, j_2, j_3 = sch.split(loop=j, factors=j_tiles)
+    k_0, k_1 = sch.split(loop=k, factors=k_tiles)
+    # Organize the loops into "SSRSRS" 6-level tiles
+    sch.reorder(
+        i_0, j_0, # S
+        i_1, j_1, # S
+        k_0,      # R
+        i_2, j_2, # S
+        k_1,      # R
+        i_3, j_3, # S
+    )
+    return sch
+```
+
+**Customize composite schedule in python**. We provide two ways to define a composite schedule in python:
+
+Method 1. Derive from `PyCompositeSchedule`, and implement two methods `initialize` and `apply`:
+
+```python
+class MultiLevelTiling(PyCompositeSchedule):
+    def initialize(...):
+        # initialize the class, usually this method is empty
+        ...
+
+    def apply(sch: Schedule, block: BlockRV) -> Union[Schedule, List[Schedule]]:
+        # write any python code, including:
+        # - analyze `block`
+        # - invoke schedule primitives in `sch`
+        # - do debug printing
+        ...
+```
+
+Method 2. A decorator as the syntactic sugar if the `initialize` method is empty, which converts the function to the `apply` method.
+
+```python
+@tir.as_composite_schedule(name="multi-level-tiling")
+def multi_level_tiling(sch: Schedule, block: BlockRV) -> Union[Schedule, List[Schedule]]:
+    # write any python code, including:
+    # - analyze `block`
+    # - invoke schedule primitives in `sch`
+    # - do debug printing
+    ...
+```
+
+**Customize exploration strategies in python**. Developers can implement any search algorithm in python as well by deriving from `PySearchPolicy`, and the syntax is identical to customizing with `PyCompositeSchedule`.
+
+**Other customizable components**. This list includes:
+
+* Cost model
+* Database
+* Measure callbacks
+* Feature extractor
+* Program builder & runner
+* Analysis methods
+* ...
+
+In a short summary, almost every component of the system is decoupled with each other and extensions could be easily plugged in.
+
+### 4.4. Upstreaming Plan

Review comment:
       Yeah it makes sense to me. Removing it from the RFC




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tvm-rfcs] tkonolige commented on a change in pull request #5: [RFC] Meta Schedule (AutoTIR)

Posted by GitBox <gi...@apache.org>.
tkonolige commented on a change in pull request #5:
URL: https://github.com/apache/tvm-rfcs/pull/5#discussion_r664918470



##########
File path: rfcs/0001-meta-schedule-autotensorir.md
##########
@@ -0,0 +1,339 @@
+<!--- Licensed to the Apache Software Foundation (ASF) under one -->
+<!--- or more contributor license agreements.  See the NOTICE file -->
+<!--- distributed with this work for additional information -->
+<!--- regarding copyright ownership.  The ASF licenses this file -->
+<!--- to you under the Apache License, Version 2.0 (the -->
+<!--- "License"); you may not use this file except in compliance -->
+<!--- with the License.  You may obtain a copy of the License at -->
+
+<!---   http://www.apache.org/licenses/LICENSE-2.0 -->
+
+<!--- Unless required by applicable law or agreed to in writing, -->
+<!--- software distributed under the License is distributed on an -->
+<!--- "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY -->
+<!--- KIND, either express or implied.  See the License for the -->
+<!--- specific language governing permissions and limitations -->
+<!--- under the License. -->
+
+* Feature Name: Meta Schedule (AutoTIR)
+* Start Date: 2021-05-28
+* RFC PR: TBD (apache/tvm-rfcs#0000)
+* GitHub Issue: TBD (apache/tvm-rfcs#0000)
+
+## 1. Summary
+
+This proposal introduces Meta Schedule: a probabilistic scheduling DSL on TIR that unifies the approaches of AutoTVM and Auto Scheduler (Ansor). Meta schedule provides a pragmatic way to define the space of automatic tuning, extensibility in terms of all possible TIR schedule primitives like tensorization and loop partitioning, and customizability on every layer of the automation system.
+
+Meta Schedule is our 3rd generation automatic scheduling system.
+
+## 2. Motivation
+
+**Scheduling and Design Space**
+
+In TVM TensorIR, optimization of a TensorIR program is done via a sequence of transformations. For example, we reorder loops for better locality and we tensorize for specific hardware intrinsics. The process of invoking such a set of pre-defined transformations is called "**scheduling**", and each transformation is called a "**schedule primitive**". These primitives form a domain-specific language (DSL) describing the transformation of TensorIR programs. **Design space** is the set of all possible schedulings with respect to a TensorIR program.
+
+**Problems with the Current Scheduling System**
+
+* **Manual schedule**: Developers optimize their programs by manually invoking schedule primitives, i.e. explore points in the design space with humans in the loop. This can be a tedious and error-prone approach, hence the creation of AutoTVM and AutoScheduler (Ansor).
+* **AutoTVM**: The automation system requires users to define "schedule templates" as the design space for each operator. Therefore, it is inextensible to hundreds of operators and variety hardware platforms.
+* **AutoScheduler (Ansor)**: It automatically generates schedule templates as the design space, according to a set of predefined "search rules". However, it is non-trivial to extend AutoScheduler to new schedule primitives (tensorize, loop partition, software pipelining, etc).
+* The three systems above have isolated sets of APIs with several layers of their own abstraction, which are not only hard to learn, but also engineering-intensive to customize.
+
+**Benefit of Meta Schedule**
+
+* Succinct syntax, consistent APIs to TensorIR schedule with no other layer of abstraction.
+* Provides unified APIs for implementing manual schedule, AutoTVM and AutoScheduler (Ansor).
+* Extensibility to all the schedule primitives, including tensorization and loop partitioning. Almost no extra effort is needed to use a new primitive in auto-tuning.
+* The automation infrastructure is extensible on every of its components. Every component of the system can be customized easily in pure python or C++ or both. For example, one can develop a new design space generator in python, a new ProgramRunner in python, etc.
+
+
+## 3. Guide-level explanation
+
+In this section, we describe the syntax of meta schedule DSL, and how it could be used to describe and auto-generate the design space.
+
+### 3.1. Manual Schedule
+
+Meta schedule APIs are almost the same as TE or TensorIR scheduling. Here is an example of a manual schedule for matrix multiplication:
+
+```python
+# Designate a set of tile sizes
+i_tiles = [16, 8, 8, 8]
+j_tiles = [16, 8, 8, 8]
+k_tiles = [256, 8]
+
+# Tile the loops according to the tile sizes
+i_0, i_1, i_2, i_3 = sch.split(loop=i, factors=i_tiles)
+j_0, j_1, j_2, j_3 = sch.split(loop=j, factors=j_tiles)
+k_0, k_1           = sch.split(loop=k, factors=k_tiles)
+
+# Organize the loops into "SSRSRS" 6-level tiles
+sch.reorder(
+    i_0, j_0, # S: the 1st spatial tile
+    i_1, j_1, # S: the 2nd spatial tile
+    k_0,      # R: the 1st reduction tile
+    i_2, j_2, # S: the 3rd spatial tile
+    k_1,      # R: the 2nd reduction tile
+    i_3, j_3, # S: the 4th spatial tile
+)
+```
+
+In this example, the developers may tweak the tile sizes and measure the performance of the generated kernels to explore the opportunities of potential optimization.
+
+Generally speaking, while writing a schedule, there are often some parameters that are hard to determine ahead of time, for example, tile sizes, unroll steps, or which tensor intrinsics to use. Developers may manually enumerate possible combinations of these unknown factors, and then pick the best schedule according to measurement results on their device.
+
+### 3.2. Composite Schedule Rules
+
+As introduced in the previous section, in TensorIR, each schedule primitive handles only a very basic transformation of the IR, for example, `split` only splits a loop into two. In the real world, the over-fine granularity of those primitives usually leads to repetitive and verbose scheduling code, as [mentioned](https://discuss.tvm.apache.org/t/rfc-tensorir-a-schedulable-ir-for-tvm/7872/43?u=junrushao1994) by developers in our community.
+
+To make it more convenient and modular, we allow users to register "composite schedules" that apply a sequence of schedule primitives according to certain analysis of the IR. For instance, a composite schedule may inspect a TensorIR block and decide whether we should call `compute_inline` on it.
+
+### 3.3. AutoTVM-style Design Space Description
+
+Meta schedule extends the schedule DSL with sampling instructions. When included in a schedule, these instructions parametrize the schedule from a single deterministic point to a space supported by random variables (tile size, etc.), making it possible for developers to describe the design space with meta schedule APIs.
+
+We can extend the matmul example above to cover all possible tilings using these sampling instructions:
+
+```python
+# Sample tile sizes
+i_tiles = sch.sample_perfect_tile(i, n=4)
+j_tiles = sch.sample_perfect_tile(j, n=4)
+k_tiles = sch.sample_perfect_tile(k, n=2)
+# Tile the loops according to the random variables
+i_0, i_1, i_2, i_3 = sch.split(loop=i, factors=i_tiles)
+j_0, j_1, j_2, j_3 = sch.split(loop=j, factors=j_tiles)
+k_0, k_1           = sch.split(loop=k, factors=k_tiles)
+# Organize the loops into "SSRSRS" 6-level tiles
+sch.reorder(
+    i_0, j_0, # S: the 1st spatial tile
+    i_1, j_1, # S: the 2nd spatial tile
+    k_0,      # R: the 1st reduction tile
+    i_2, j_2, # S: the 3rd spatial tile
+    k_1,      # R: the 2nd reduction tile
+    i_3, j_3, # S: the 4th spatial tile
+)
+```
+
+### 3.4. AutoScheduler-style Design Space Generation
+
+AutoScheduler (Ansor) generates schedule templates by applying their SearchRules to each stage. SearchRule analyzes TE and eagerly trigger schedule primitives accordingly in its internally maintained mini IR.
+
+As introduced in Section 3.2, composite schedule rules are equivalent to AutoScheduler's SearchRule in TensorIR scheduling. To further generate a design space for scheduling, sampling instructions are used in composite schedule rules. Similarly, the sketch generation phase in AutoScheduler is equivalent to applying composite schedule rules to each block in TensorIR.
+
+Several built-in composite schedule rules are shipped with our system to align with Ansor's design space:
+
+* Multi-level tiling
+* Inline pure spatial blocks
+* Parallelize & vectorize & unroll
+* Auto tensorize
+* ...
+
+Developers may implement their own rules in either Python or C++, and specify which rules to use with the following syntax:
+
+```python
+from tvm import meta_schedule as ms
+
+design_space_generator = ms.PostOrderApply(rules=[
+    ms.MultiLevelTiling(...),
+    CustomRule(...),
+    ms.OtherBuiltinRules(...),
+])
+
+```
+
+### 3.5. Unifying manual schedule / AutoTVM / Ansor
+
+In this section, we show that the design space induced by TE manual schedule, AutoTVM and Ansor are all subsets of meta schedule, and meta schedule further allows mixing those three styles to search jointly.
+
+**Manual schedule**. The TE schedule is a special case of a meta schedule program, where there is no randomness introduced by sampling instructions. It is a single point in terms of design space.
+
+**AutoTVM (Template-based tuning)**. It is more natural representation of AutoTVM’s schedule templates (knobs) by writing one or more schedule functions in meta schedule with sampling instructions. The execution traces generated by the schedule functions are the design space to be explored.
+
+**AutoScheduler (Ansor, Template-free tuning)**. As mentioned in the previous section, application of composite schedule rules generates the design space, which is equivalent to Ansor’s sketch generation.
+
+**Mixing styles in design space definition**. By taking union of the spaces induced by the three special cases, our system allows developers to combine generic rules that Ansor provides and operator-specific scheduling.
+
+## 4. Reference-level explanation
+
+In this section, we introduce the underlying techniques for the automation system to extract and explore the design space. The figure below briefly illustrates the workflow of the system:
+
+![meta-schedule-workflow](../resources/meta-schedule-workflow.png)
+
+### 4.1. Execution trace as the design space
+
+**Trace**. To represent the design space defined by the meta schedule DSL, the underlying system records all the instructions users applied to the schedule class, including sampling and schedule primitives. We call this list of instructions a trace.
+
+Executing the example above results in the following trace:
+
+```
+Instruction 0. Sample-Perfect-Tile
+Instruction 1. Sample-Perfect-Tile
+Instruction 2. Sample-Perfect-Tile
+Instruction 3. Split
+Instruction 4. Split
+Instruction 5. Split
+Instruction 6. Reorder
+```
+
+**Trace forms design space.** A trace may contain zero or more sampling instructions, which introduces the uncertainty in scheduling - one instance of sampling results in one point in the design space. Therefore, the trace itself forms a design space to explore, e.g. which set of tile sizes works best on a specific hardware.
+
+**Union of design space**. Our system works on a set of traces, representing the union of the design spaces represented by every single trace.
+
+**Fork a trace**. When two different decisions in the scheduling process are equally important to generate high-performance schedules, we allow forking the trace into two, and the design space is the union of the forked traces.
+
+The trace is not strictly user-facing, but can be accessed and printed with the following syntax:
+
+```python
+# requires to trace the execution
+sch = tir.Schedule(..., traced=True)
+# do a lot of scheduling
+...
+# print the trace
+print(sch.trace)
+```
+
+And below is an example of the printed trace, which honestly reflects the schedule as a snippet of python scheduling function:
+
+```python
+b0 = sch.get_block(name="matmul", func_name="main")
+l1, l2, l3 = sch.get_loops(block=b0)
+v4, v5, v6, v7 = sch.sample_perfect_tile(loop=l1, n=4, max_innermost_factor=16, decision=[32, 1, 16, 2])
+v8, v9, v10, v11 = sch.sample_perfect_tile(loop=l2, n=4, max_innermost_factor=16, decision=[64, 4, 2, 2])
+v12, v13 = sch.sample_perfect_tile(loop=l3, n=2, max_innermost_factor=16, decision=[64, 16])
+l14, l15, l16, l17 = sch.split(loop=l1, factors=[v4, v5, v6, v7])
+l18, l19, l20, l21 = sch.split(loop=l2, factors=[v8, v9, v10, v11])
+l22, l23 = sch.split(loop=l3, factors=[v12, v13])
+sch.reorder(l14, l18, l15, l19, l22, l16, l20, l23, l17, l21)
+```
+
+### 4.2. Exploring the Search Space
+
+Meta Schedule provides several built-in exploration strategies to exhaustively or efficiently search for efficient schedules.
+
+**Program replay**. A simple strategy that replays the user-provided schedule function without using or taking any advantage of trace.

Review comment:
       Looks good, but I'm a little unclear on why you repetitively replay traces. Shouldn't a single replay be enough?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tvm-rfcs] junrushao1994 commented on a change in pull request #5: [RFC] Meta Schedule (AutoTIR)

Posted by GitBox <gi...@apache.org>.
junrushao1994 commented on a change in pull request #5:
URL: https://github.com/apache/tvm-rfcs/pull/5#discussion_r668335454



##########
File path: rfcs/0001-meta-schedule-autotensorir.md
##########
@@ -0,0 +1,444 @@
+<!--- Licensed to the Apache Software Foundation (ASF) under one -->
+<!--- or more contributor license agreements.  See the NOTICE file -->
+<!--- distributed with this work for additional information -->
+<!--- regarding copyright ownership.  The ASF licenses this file -->
+<!--- to you under the Apache License, Version 2.0 (the -->
+<!--- "License"); you may not use this file except in compliance -->
+<!--- with the License.  You may obtain a copy of the License at -->
+
+<!---   http://www.apache.org/licenses/LICENSE-2.0 -->
+
+<!--- Unless required by applicable law or agreed to in writing, -->
+<!--- software distributed under the License is distributed on an -->
+<!--- "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY -->
+<!--- KIND, either express or implied.  See the License for the -->
+<!--- specific language governing permissions and limitations -->
+<!--- under the License. -->
+
+* Feature Name: Meta Schedule (Formerly AutoTIR)
+* Start Date: 2021-05-28
+* RFC PR: TBD (apache/tvm-rfcs#0000)
+* GitHub Issue: TBD (apache/tvm-rfcs#0000)
+
+## 1. Summary
+
+This proposal introduces Meta Schedule: a probabilistic scheduling DSL on TIR that unifies the
+approaches of AutoTVM and Auto Scheduler (Ansor). Meta schedule provides a pragmatic way to define
+the space of automatic tuning, extensibility in terms of all possible TIR schedule primitives like
+tensorization and loop partitioning, and customizability on every layer of the automation system.
+
+Meta Schedule is our 3rd generation automatic scheduling system.
+
+## 2. Motivation
+
+**Scheduling and design space.** In TVM TensorIR, optimization of a TensorIR program is done via a
+sequence of transformations. For example, we reorder loops for better locality and we tensorize for
+specific hardware intrinsics. The process of invoking such a set of pre-defined transformations is
+called "**scheduling**", and each transformation is called a "**schedule primitive**". These
+primitives form a domain-specific language (DSL) describing the transformation of TensorIR programs.
+**Design space** is the set of all possible schedulings with respect to a TensorIR program.
+
+**Problems with the current scheduling system.** Currently we have 3 sets of scheduling APIs:
+* **Manual schedule**: Developers optimize their programs by manually invoking schedule primitives,
+  i.e. explore points in the design space with humans in the loop. This can be a tedious and
+  error-prone approach, hence the creation of AutoTVM and AutoScheduler (Ansor).
+* **AutoTVM**: The automation system requires users to define "schedule templates" as the design
+  space for each operator. Therefore, it is inextensible to hundreds of operators and variety
+  hardware platforms.
+* **AutoScheduler (Ansor)**: It automatically generates schedule templates as the design space,
+  according to a set of predefined "search rules". However, it is non-trivial to extend
+  AutoScheduler to new schedule primitives (tensorize, loop partition, software pipelining, etc).
+* The three systems above have isolated sets of APIs with several layers of their own abstraction,
+  which are not only hard to learn, but also engineering-intensive to customize.
+
+**Benefits of Meta Schedule.**  Meta schedule provides:
+* Succinct syntax, consistent APIs to TensorIR schedule with no other layer of abstraction.
+* Unified APIs for implementing manual schedules, AutoTVM-style schedules, and AutoScheduler-style
+  schedules.
+* Extensibility of all the schedule primitives, including tensorization and loop partitioning.
+  Almost no extra effort is needed to use a new primitive in auto-tuning.
+* The automation infrastructure is extensible on every of its components. Every component of the
+  system can be customized easily in pure python or C++ or both. For example, one can develop a new
+  design space generator in python, a new ProgramRunner in python, etc.
+
+
+## 3. Guide-level explanation
+
+In this section, we describe the syntax of meta schedule DSL, and how it could be used to describe
+and auto-generate the design space.
+
+### 3.1. Manual Schedule
+
+Meta schedule APIs are almost the same as TE or TensorIR scheduling. Here is an example of a manual
+schedule for matrix multiplication:
+
+```python
+# Designate a set of tile sizes
+i_tiles = [16, 8, 8, 8]
+j_tiles = [16, 8, 8, 8]
+k_tiles = [256, 8]
+
+# Tile the loops according to the tile sizes
+i_0, i_1, i_2, i_3 = sch.split(loop=i, factors=i_tiles)
+j_0, j_1, j_2, j_3 = sch.split(loop=j, factors=j_tiles)
+k_0, k_1           = sch.split(loop=k, factors=k_tiles)
+
+# Organize the loops into "SSRSRS" 6-level tiles
+sch.reorder(
+    i_0, j_0, # S: the 1st spatial tile
+    i_1, j_1, # S: the 2nd spatial tile
+    k_0,      # R: the 1st reduction tile
+    i_2, j_2, # S: the 3rd spatial tile
+    k_1,      # R: the 2nd reduction tile
+    i_3, j_3, # S: the 4th spatial tile
+)
+```
+
+In this example, the developers may tweak the tile sizes and measure the performance of the
+generated kernels to explore the opportunities of potential optimization.
+
+Generally speaking, while writing a schedule, there are often some parameters that are hard to
+determine ahead of time, for example, tile sizes, unroll steps, or which tensor intrinsics to use.
+Developers may manually enumerate possible combinations of these unknown factors, and then pick the
+best schedule according to measurement results on their device.
+
+### 3.2. Composite Schedule Rules
+
+As introduced in the previous section, in TensorIR, each schedule primitive handles only a very
+basic transformation of the IR. For example, `split` only splits a loop into two new loops. In the
+real world, the over-fine granularity of those primitives usually leads to repetitive and verbose
+scheduling code, as
+[mentioned](https://discuss.tvm.apache.org/t/rfc-tensorir-a-schedulable-ir-for-tvm/7872/43?u=junrushao1994)
+by developers in our community.
+
+To make it more convenient and modular, we allow users to register "composite schedules" that apply
+a sequence of schedule primitives according to certain analysis of the IR. For instance, a composite
+schedule may inspect a TensorIR block and decide whether we should call `compute_inline` on it.
+
+### 3.3. AutoTVM-style Design Space Description
+
+Meta schedule extends the schedule DSL with sampling instructions. When included in a schedule,
+these instructions parametrize the schedule from a single deterministic point to a space supported
+by random variables (tile size, etc.), making it possible for developers to describe the design
+space with meta schedule APIs.
+
+We can extend the matmul example above to cover all possible tilings using these sampling
+instructions:
+
+```python
+# Sample tile sizes
+i_tiles = sch.sample_perfect_tile(i, n=4)
+j_tiles = sch.sample_perfect_tile(j, n=4)
+k_tiles = sch.sample_perfect_tile(k, n=2)
+# Tile the loops according to the random variables
+i_0, i_1, i_2, i_3 = sch.split(loop=i, factors=i_tiles)
+j_0, j_1, j_2, j_3 = sch.split(loop=j, factors=j_tiles)
+k_0, k_1           = sch.split(loop=k, factors=k_tiles)
+# Organize the loops into "SSRSRS" 6-level tiles
+sch.reorder(
+    i_0, j_0, # S: the 1st spatial tile
+    i_1, j_1, # S: the 2nd spatial tile
+    k_0,      # R: the 1st reduction tile
+    i_2, j_2, # S: the 3rd spatial tile
+    k_1,      # R: the 2nd reduction tile
+    i_3, j_3, # S: the 4th spatial tile
+)
+```
+
+### 3.4. AutoScheduler-style Design Space Generation
+
+AutoScheduler (Ansor) generates schedule templates by applying their SearchRules to each stage.
+SearchRule analyzes TE and eagerly trigger schedule primitives accordingly in its internally
+maintained mini IR.
+
+As introduced in Section 3.2, composite schedule rules are equivalent to AutoScheduler's SearchRule
+in TensorIR scheduling. To further generate a design space for scheduling, sampling instructions are
+used in composite schedule rules. Similarly, the sketch generation phase in AutoScheduler is
+equivalent to applying composite schedule rules to each block in TensorIR.
+
+Several built-in composite schedule rules are shipped with our system to align with Ansor's design
+space:
+
+* Multi-level tiling
+* Inline pure spatial blocks
+* Parallelize & vectorize & unroll
+* Auto tensorize
+
+Developers may implement their own rules in either Python or C++. They may specify which rules to
+use with the following syntax:
+
+```python
+from tvm import meta_schedule as ms
+
+design_space_generator = ms.PostOrderApply(rules=[
+    ms.MultiLevelTiling(...),
+    CustomRule(...),
+    ms.OtherBuiltinRules(...),
+])
+
+```
+
+### 3.5. Unifying manual schedule / AutoTVM / Ansor
+
+In this section, we show that the design space induced by TE manual schedule, AutoTVM and Ansor are
+all subsets of meta schedule, and meta schedule further allows mixing those three styles to search
+jointly.
+
+**Manual schedule**. The TE schedule is a special case of a meta schedule program, where there is no
+randomness introduced by sampling instructions. It is a single point in terms of design space.
+
+**AutoTVM (Template-based tuning)**. It is more natural representation of AutoTVM’s schedule
+templates (knobs) by writing one or more schedule functions in meta schedule with sampling
+instructions. The execution traces generated by the schedule functions are the design space to be
+explored.
+
+**AutoScheduler (Ansor, Template-free tuning)**. As mentioned in the previous section, application
+of composite schedule rules generates the design space, which is equivalent to Ansor’s sketch
+generation.
+
+**Mixing styles in design space definition**. By taking union of the spaces induced by the three

Review comment:
       Yeah I agree. The RFC itself is more related to "a DSL that defines a search space", not task extraction / lowering (i suppose it is part of TECompiler?), but it will be helpful to demonstrate the entire search process 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tvm-rfcs] comaniac commented on a change in pull request #5: [RFC] Meta Schedule (AutoTIR)

Posted by GitBox <gi...@apache.org>.
comaniac commented on a change in pull request #5:
URL: https://github.com/apache/tvm-rfcs/pull/5#discussion_r648491301



##########
File path: rfcs/0001-meta-schedule-autotensorir.md
##########
@@ -0,0 +1,297 @@
+* Feature Name: Meta Schedule (AutoTensorIR)
+* Start Date: 2021-05-28
+* RFC PR: TBD (apache/tvm-rfcs#0000)
+* GitHub Issue: TBD (apache/tvm-rfcs#0000)
+
+## 1. Summary
+
+This proposal introduces Meta Schedule: a probabilistic scheduling DSL on TIR that unifies the approaches of AutoTVM and Auto Scheduler (Ansor). Meta schedule provides a pragmatic way to define the space of automatic tuning, extensibility in terms of all possible TIR schedule primitives like tensorization and loop partitioning, and customizability on every layer of the automation system.
+
+Meta Schedule is our 3rd generation automatic scheduling system.
+
+## 2. Motivation
+
+**Scheduling and Design Space**
+
+In TVM TensorIR, optimization of a TensorIR program is done via a sequence of transformations. For example, we reorder loops for better locality and we tensorize for specific hardware intrinsics. The process of invoking such a set of pre-defined transformations is called “**scheduling**”, and each transformation is called a “**schedule primitive**”. These primitives form a domain-specific language (DSL) describing the transformation of TensorIR programs. **Design space** is the set of all possible schedulings with respect to a TensorIR program.
+
+**Problems with the Current Scheduling System**
+
+* **Manual schedule**: Developers optimize their programs by manually invoking schedule primitives, i.e. explore points in the design space with humans in the loop. This can be a tedious and error-prone approach, hence the creation of AutoTVM and AutoScheduler (Ansor).
+* **AutoTVM**: The automation system requires users to define “schedule templates” as the design space for each operator. Therefore, it is inextensible to hundreds of operators.
+* **AutoScheduler (Ansor)**: It automatically generates schedule templates as the design space, according to a set of predefined “search rules”. However, it is non-trivial to extend AutoScheduler to new schedule primitives (tensorize, loop partition, software pipelining).
+* The three systems above have isolated sets of APIs with several layers of their own abstraction, which are not only hard to learn, but also engineering-intensive to customize.
+
+**Benefit of Meta Schedule**
+
+* Succinct syntax, consistent APIs to TensorIR schedule with no other layer of abstraction.
+* Provides unified APIs for implementing manual schedule, AutoTVM and AutoScheduler (Ansor).
+* Extensibility to all the schedule primitives, including tensorization and loop partitioning. Almost no extra effort is needed to use a new primitive in auto-tuning.
+* The automation infrastructure is customizable across every layer.
+
+## 3. Guide-level explanation
+
+In this section, we describe the syntax of meta schedule DSL, and how it could be used to describe and auto-generate the design space.
+
+### 3.1. Manual Schedule
+
+Meta schedule APIs are almost the same as TE or TensorIR scheduling. Here is an example of a manual schedule for matrix multiplication:
+
+```python
+# Designate a set of tile sizes
+i_tiles = [16, 8, 8, 8]
+j_tiles = [16, 8, 8, 8]
+k_tiles = [256, 8]
+
+# Tile the loops according to the tile sizes
+i_0, i_1, i_2, i_3 = sch.split(loop=i, factors=i_tiles)
+j_0, j_1, j_2, j_3 = sch.split(loop=j, factors=j_tiles)
+k_0, k_1           = sch.split(loop=k, factors=k_tiles)
+
+# Organize the loops into “SSRSRS” 6-level tiles
+sch.reorder(
+    i_0, j_0, # S
+    i_1, j_1, # S
+    k_0,      # R
+    i_2, j_2, # S
+    k_1,      # R
+    i_3, j_3, # S
+)
+```
+
+In this example, the developers may tweak the tile sizes and measure the performance of the generated kernels to explore the opportunities of potential optimization.
+
+Generally speaking, while writing a schedule, there are often some parameters that are hard to determine ahead of time, for example, tile sizes, unroll steps, or which tensor intrinsics to use. Developers may manually enumerate possible combinations of these unknown factors, and then pick the best schedule according to measurement results on their device.
+
+### 3.2. AutoTVM-style Design Space Description
+
+Meta schedule extends the schedule DSL with sampling instructions. When included in a schedule, these instructions parametrize the schedule from a single deterministic point to a space supported by random variables (tile size, etc.), making it possible for developers to describe the design space with meta schedule APIs.
+
+We can extend the matmul example above to cover all possible tilings using these sampling instructions:
+
+```python
+# Sample tile sizes
+i_tiles = sch.sample_perfect_tile(i, n=4)

Review comment:
       I think it means the tile size is always divisible to `n`.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tvm-rfcs] junrushao1994 commented on a change in pull request #5: [RFC] Meta Schedule (AutoTIR)

Posted by GitBox <gi...@apache.org>.
junrushao1994 commented on a change in pull request #5:
URL: https://github.com/apache/tvm-rfcs/pull/5#discussion_r676873020



##########
File path: rfcs/0001-meta-schedule-autotensorir.md
##########
@@ -107,29 +128,64 @@ best schedule according to measurement results on their device.
 As introduced in the previous section, in TensorIR, each schedule primitive handles only a very
 basic transformation of the IR. For example, `split` only splits a loop into two new loops. In the
 real world, the over-fine granularity of those primitives usually leads to repetitive and verbose
-scheduling code, as
-[mentioned](https://discuss.tvm.apache.org/t/rfc-tensorir-a-schedulable-ir-for-tvm/7872/43?u=junrushao1994)
-by developers in our community.
+scheduling code. Take the code snippet in the previous section as an example, a sequence of `split`s
+are invoked, followed by a `reorder`, and all these together are called "SSRSRS" tiling.
+
+To make it more convenient and modular, users are allowed to register **composite schedules** that apply
+a sequence of schedule primitives according to certain analysis of the IR. The word *composite* here
+is used against the word *primitive*, which means it is a transformation *composed* of those

Review comment:
       Thanks for helping me rephrase! It reads much better :-)




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tvm-rfcs] junrushao1994 commented on a change in pull request #5: [RFC] Meta Schedule (AutoTIR)

Posted by GitBox <gi...@apache.org>.
junrushao1994 commented on a change in pull request #5:
URL: https://github.com/apache/tvm-rfcs/pull/5#discussion_r668383535



##########
File path: rfcs/0001-meta-schedule-autotensorir.md
##########
@@ -0,0 +1,444 @@
+<!--- Licensed to the Apache Software Foundation (ASF) under one -->
+<!--- or more contributor license agreements.  See the NOTICE file -->
+<!--- distributed with this work for additional information -->
+<!--- regarding copyright ownership.  The ASF licenses this file -->
+<!--- to you under the Apache License, Version 2.0 (the -->
+<!--- "License"); you may not use this file except in compliance -->
+<!--- with the License.  You may obtain a copy of the License at -->
+
+<!---   http://www.apache.org/licenses/LICENSE-2.0 -->
+
+<!--- Unless required by applicable law or agreed to in writing, -->
+<!--- software distributed under the License is distributed on an -->
+<!--- "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY -->
+<!--- KIND, either express or implied.  See the License for the -->
+<!--- specific language governing permissions and limitations -->
+<!--- under the License. -->
+
+* Feature Name: Meta Schedule (Formerly AutoTIR)
+* Start Date: 2021-05-28
+* RFC PR: TBD (apache/tvm-rfcs#0000)
+* GitHub Issue: TBD (apache/tvm-rfcs#0000)
+
+## 1. Summary
+
+This proposal introduces Meta Schedule: a probabilistic scheduling DSL on TIR that unifies the
+approaches of AutoTVM and Auto Scheduler (Ansor). Meta schedule provides a pragmatic way to define
+the space of automatic tuning, extensibility in terms of all possible TIR schedule primitives like
+tensorization and loop partitioning, and customizability on every layer of the automation system.
+
+Meta Schedule is our 3rd generation automatic scheduling system.
+
+## 2. Motivation
+
+**Scheduling and design space.** In TVM TensorIR, optimization of a TensorIR program is done via a
+sequence of transformations. For example, we reorder loops for better locality and we tensorize for
+specific hardware intrinsics. The process of invoking such a set of pre-defined transformations is
+called "**scheduling**", and each transformation is called a "**schedule primitive**". These
+primitives form a domain-specific language (DSL) describing the transformation of TensorIR programs.
+**Design space** is the set of all possible schedulings with respect to a TensorIR program.
+
+**Problems with the current scheduling system.** Currently we have 3 sets of scheduling APIs:
+* **Manual schedule**: Developers optimize their programs by manually invoking schedule primitives,
+  i.e. explore points in the design space with humans in the loop. This can be a tedious and
+  error-prone approach, hence the creation of AutoTVM and AutoScheduler (Ansor).
+* **AutoTVM**: The automation system requires users to define "schedule templates" as the design
+  space for each operator. Therefore, it is inextensible to hundreds of operators and variety
+  hardware platforms.
+* **AutoScheduler (Ansor)**: It automatically generates schedule templates as the design space,
+  according to a set of predefined "search rules". However, it is non-trivial to extend
+  AutoScheduler to new schedule primitives (tensorize, loop partition, software pipelining, etc).
+* The three systems above have isolated sets of APIs with several layers of their own abstraction,
+  which are not only hard to learn, but also engineering-intensive to customize.
+
+**Benefits of Meta Schedule.**  Meta schedule provides:
+* Succinct syntax, consistent APIs to TensorIR schedule with no other layer of abstraction.
+* Unified APIs for implementing manual schedules, AutoTVM-style schedules, and AutoScheduler-style
+  schedules.
+* Extensibility of all the schedule primitives, including tensorization and loop partitioning.
+  Almost no extra effort is needed to use a new primitive in auto-tuning.
+* The automation infrastructure is extensible on every of its components. Every component of the
+  system can be customized easily in pure python or C++ or both. For example, one can develop a new
+  design space generator in python, a new ProgramRunner in python, etc.
+
+
+## 3. Guide-level explanation
+
+In this section, we describe the syntax of meta schedule DSL, and how it could be used to describe
+and auto-generate the design space.
+
+### 3.1. Manual Schedule
+
+Meta schedule APIs are almost the same as TE or TensorIR scheduling. Here is an example of a manual
+schedule for matrix multiplication:
+
+```python
+# Designate a set of tile sizes
+i_tiles = [16, 8, 8, 8]
+j_tiles = [16, 8, 8, 8]
+k_tiles = [256, 8]
+
+# Tile the loops according to the tile sizes
+i_0, i_1, i_2, i_3 = sch.split(loop=i, factors=i_tiles)
+j_0, j_1, j_2, j_3 = sch.split(loop=j, factors=j_tiles)
+k_0, k_1           = sch.split(loop=k, factors=k_tiles)
+
+# Organize the loops into "SSRSRS" 6-level tiles
+sch.reorder(
+    i_0, j_0, # S: the 1st spatial tile
+    i_1, j_1, # S: the 2nd spatial tile
+    k_0,      # R: the 1st reduction tile
+    i_2, j_2, # S: the 3rd spatial tile
+    k_1,      # R: the 2nd reduction tile
+    i_3, j_3, # S: the 4th spatial tile
+)
+```
+
+In this example, the developers may tweak the tile sizes and measure the performance of the
+generated kernels to explore the opportunities of potential optimization.
+
+Generally speaking, while writing a schedule, there are often some parameters that are hard to
+determine ahead of time, for example, tile sizes, unroll steps, or which tensor intrinsics to use.
+Developers may manually enumerate possible combinations of these unknown factors, and then pick the
+best schedule according to measurement results on their device.
+
+### 3.2. Composite Schedule Rules
+
+As introduced in the previous section, in TensorIR, each schedule primitive handles only a very
+basic transformation of the IR. For example, `split` only splits a loop into two new loops. In the
+real world, the over-fine granularity of those primitives usually leads to repetitive and verbose
+scheduling code, as
+[mentioned](https://discuss.tvm.apache.org/t/rfc-tensorir-a-schedulable-ir-for-tvm/7872/43?u=junrushao1994)
+by developers in our community.
+
+To make it more convenient and modular, we allow users to register "composite schedules" that apply
+a sequence of schedule primitives according to certain analysis of the IR. For instance, a composite
+schedule may inspect a TensorIR block and decide whether we should call `compute_inline` on it.
+
+### 3.3. AutoTVM-style Design Space Description
+
+Meta schedule extends the schedule DSL with sampling instructions. When included in a schedule,
+these instructions parametrize the schedule from a single deterministic point to a space supported
+by random variables (tile size, etc.), making it possible for developers to describe the design
+space with meta schedule APIs.
+
+We can extend the matmul example above to cover all possible tilings using these sampling
+instructions:
+
+```python
+# Sample tile sizes
+i_tiles = sch.sample_perfect_tile(i, n=4)
+j_tiles = sch.sample_perfect_tile(j, n=4)
+k_tiles = sch.sample_perfect_tile(k, n=2)
+# Tile the loops according to the random variables
+i_0, i_1, i_2, i_3 = sch.split(loop=i, factors=i_tiles)
+j_0, j_1, j_2, j_3 = sch.split(loop=j, factors=j_tiles)
+k_0, k_1           = sch.split(loop=k, factors=k_tiles)
+# Organize the loops into "SSRSRS" 6-level tiles
+sch.reorder(
+    i_0, j_0, # S: the 1st spatial tile
+    i_1, j_1, # S: the 2nd spatial tile
+    k_0,      # R: the 1st reduction tile
+    i_2, j_2, # S: the 3rd spatial tile
+    k_1,      # R: the 2nd reduction tile
+    i_3, j_3, # S: the 4th spatial tile
+)
+```
+
+### 3.4. AutoScheduler-style Design Space Generation
+
+AutoScheduler (Ansor) generates schedule templates by applying their SearchRules to each stage.
+SearchRule analyzes TE and eagerly trigger schedule primitives accordingly in its internally
+maintained mini IR.
+
+As introduced in Section 3.2, composite schedule rules are equivalent to AutoScheduler's SearchRule
+in TensorIR scheduling. To further generate a design space for scheduling, sampling instructions are
+used in composite schedule rules. Similarly, the sketch generation phase in AutoScheduler is
+equivalent to applying composite schedule rules to each block in TensorIR.
+
+Several built-in composite schedule rules are shipped with our system to align with Ansor's design
+space:
+
+* Multi-level tiling
+* Inline pure spatial blocks
+* Parallelize & vectorize & unroll
+* Auto tensorize
+
+Developers may implement their own rules in either Python or C++. They may specify which rules to
+use with the following syntax:
+
+```python
+from tvm import meta_schedule as ms
+
+design_space_generator = ms.PostOrderApply(rules=[
+    ms.MultiLevelTiling(...),
+    CustomRule(...),
+    ms.OtherBuiltinRules(...),
+])
+
+```
+
+### 3.5. Unifying manual schedule / AutoTVM / Ansor
+
+In this section, we show that the design space induced by TE manual schedule, AutoTVM and Ansor are
+all subsets of meta schedule, and meta schedule further allows mixing those three styles to search
+jointly.
+
+**Manual schedule**. The TE schedule is a special case of a meta schedule program, where there is no
+randomness introduced by sampling instructions. It is a single point in terms of design space.
+
+**AutoTVM (Template-based tuning)**. It is more natural representation of AutoTVM’s schedule
+templates (knobs) by writing one or more schedule functions in meta schedule with sampling
+instructions. The execution traces generated by the schedule functions are the design space to be
+explored.
+
+**AutoScheduler (Ansor, Template-free tuning)**. As mentioned in the previous section, application
+of composite schedule rules generates the design space, which is equivalent to Ansor’s sketch
+generation.
+
+**Mixing styles in design space definition**. By taking union of the spaces induced by the three
+special cases, our system allows developers to combine generic rules that Ansor provides and
+operator-specific scheduling.
+
+## 4. Reference-level explanation
+
+In this section, we introduce the underlying techniques for the automation system to extract and
+explore the design space. The figure below briefly illustrates the workflow of the system:
+
+![meta-schedule-workflow](../resources/meta-schedule-workflow.png)
+
+### 4.1. Execution trace as the design space

Review comment:
       `Trace` is more like an underlying mechanism, which is not strictly user facing, so that's why i put it in reference-level rather than guideline-level. wdty?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tvm-rfcs] junrushao1994 commented on a change in pull request #5: [RFC] Meta Schedule (AutoTIR)

Posted by GitBox <gi...@apache.org>.
junrushao1994 commented on a change in pull request #5:
URL: https://github.com/apache/tvm-rfcs/pull/5#discussion_r661745455



##########
File path: rfcs/0001-meta-schedule-autotensorir.md
##########
@@ -0,0 +1,297 @@
+* Feature Name: Meta Schedule (AutoTensorIR)
+* Start Date: 2021-05-28
+* RFC PR: TBD (apache/tvm-rfcs#0000)
+* GitHub Issue: TBD (apache/tvm-rfcs#0000)
+
+## 1. Summary
+
+This proposal introduces Meta Schedule: a probabilistic scheduling DSL on TIR that unifies the approaches of AutoTVM and Auto Scheduler (Ansor). Meta schedule provides a pragmatic way to define the space of automatic tuning, extensibility in terms of all possible TIR schedule primitives like tensorization and loop partitioning, and customizability on every layer of the automation system.
+
+Meta Schedule is our 3rd generation automatic scheduling system.
+
+## 2. Motivation
+
+**Scheduling and Design Space**
+
+In TVM TensorIR, optimization of a TensorIR program is done via a sequence of transformations. For example, we reorder loops for better locality and we tensorize for specific hardware intrinsics. The process of invoking such a set of pre-defined transformations is called “**scheduling**”, and each transformation is called a “**schedule primitive**”. These primitives form a domain-specific language (DSL) describing the transformation of TensorIR programs. **Design space** is the set of all possible schedulings with respect to a TensorIR program.
+
+**Problems with the Current Scheduling System**
+
+* **Manual schedule**: Developers optimize their programs by manually invoking schedule primitives, i.e. explore points in the design space with humans in the loop. This can be a tedious and error-prone approach, hence the creation of AutoTVM and AutoScheduler (Ansor).
+* **AutoTVM**: The automation system requires users to define “schedule templates” as the design space for each operator. Therefore, it is inextensible to hundreds of operators.

Review comment:
       That makes sense. Thanks!




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tvm-rfcs] comaniac commented on a change in pull request #5: [RFC] Meta Schedule (AutoTIR)

Posted by GitBox <gi...@apache.org>.
comaniac commented on a change in pull request #5:
URL: https://github.com/apache/tvm-rfcs/pull/5#discussion_r648506005



##########
File path: rfcs/0001-meta-schedule-autotensorir.md
##########
@@ -0,0 +1,297 @@
+* Feature Name: Meta Schedule (AutoTensorIR)
+* Start Date: 2021-05-28
+* RFC PR: TBD (apache/tvm-rfcs#0000)
+* GitHub Issue: TBD (apache/tvm-rfcs#0000)
+
+## 1. Summary
+
+This proposal introduces Meta Schedule: a probabilistic scheduling DSL on TIR that unifies the approaches of AutoTVM and Auto Scheduler (Ansor). Meta schedule provides a pragmatic way to define the space of automatic tuning, extensibility in terms of all possible TIR schedule primitives like tensorization and loop partitioning, and customizability on every layer of the automation system.
+
+Meta Schedule is our 3rd generation automatic scheduling system.
+
+## 2. Motivation
+
+**Scheduling and Design Space**
+
+In TVM TensorIR, optimization of a TensorIR program is done via a sequence of transformations. For example, we reorder loops for better locality and we tensorize for specific hardware intrinsics. The process of invoking such a set of pre-defined transformations is called “**scheduling**”, and each transformation is called a “**schedule primitive**”. These primitives form a domain-specific language (DSL) describing the transformation of TensorIR programs. **Design space** is the set of all possible schedulings with respect to a TensorIR program.
+
+**Problems with the Current Scheduling System**
+
+* **Manual schedule**: Developers optimize their programs by manually invoking schedule primitives, i.e. explore points in the design space with humans in the loop. This can be a tedious and error-prone approach, hence the creation of AutoTVM and AutoScheduler (Ansor).
+* **AutoTVM**: The automation system requires users to define “schedule templates” as the design space for each operator. Therefore, it is inextensible to hundreds of operators.
+* **AutoScheduler (Ansor)**: It automatically generates schedule templates as the design space, according to a set of predefined “search rules”. However, it is non-trivial to extend AutoScheduler to new schedule primitives (tensorize, loop partition, software pipelining).
+* The three systems above have isolated sets of APIs with several layers of their own abstraction, which are not only hard to learn, but also engineering-intensive to customize.
+
+**Benefit of Meta Schedule**
+
+* Succinct syntax, consistent APIs to TensorIR schedule with no other layer of abstraction.
+* Provides unified APIs for implementing manual schedule, AutoTVM and AutoScheduler (Ansor).
+* Extensibility to all the schedule primitives, including tensorization and loop partitioning. Almost no extra effort is needed to use a new primitive in auto-tuning.
+* The automation infrastructure is customizable across every layer.
+
+## 3. Guide-level explanation
+
+In this section, we describe the syntax of meta schedule DSL, and how it could be used to describe and auto-generate the design space.
+
+### 3.1. Manual Schedule
+
+Meta schedule APIs are almost the same as TE or TensorIR scheduling. Here is an example of a manual schedule for matrix multiplication:
+
+```python
+# Designate a set of tile sizes
+i_tiles = [16, 8, 8, 8]
+j_tiles = [16, 8, 8, 8]
+k_tiles = [256, 8]
+
+# Tile the loops according to the tile sizes
+i_0, i_1, i_2, i_3 = sch.split(loop=i, factors=i_tiles)
+j_0, j_1, j_2, j_3 = sch.split(loop=j, factors=j_tiles)
+k_0, k_1           = sch.split(loop=k, factors=k_tiles)
+
+# Organize the loops into “SSRSRS” 6-level tiles
+sch.reorder(
+    i_0, j_0, # S
+    i_1, j_1, # S
+    k_0,      # R
+    i_2, j_2, # S
+    k_1,      # R
+    i_3, j_3, # S
+)
+```
+
+In this example, the developers may tweak the tile sizes and measure the performance of the generated kernels to explore the opportunities of potential optimization.
+
+Generally speaking, while writing a schedule, there are often some parameters that are hard to determine ahead of time, for example, tile sizes, unroll steps, or which tensor intrinsics to use. Developers may manually enumerate possible combinations of these unknown factors, and then pick the best schedule according to measurement results on their device.
+
+### 3.2. AutoTVM-style Design Space Description
+
+Meta schedule extends the schedule DSL with sampling instructions. When included in a schedule, these instructions parametrize the schedule from a single deterministic point to a space supported by random variables (tile size, etc.), making it possible for developers to describe the design space with meta schedule APIs.
+
+We can extend the matmul example above to cover all possible tilings using these sampling instructions:
+
+```python
+# Sample tile sizes
+i_tiles = sch.sample_perfect_tile(i, n=4)
+j_tiles = sch.sample_perfect_tile(j, n=4)
+k_tiles = sch.sample_perfect_tile(k, n=2)
+# Tile the loops according to the random variables
+i_0, i_1, i_2, i_3 = sch.split(loop=i, factors=i_tiles)
+j_0, j_1, j_2, j_3 = sch.split(loop=j, factors=j_tiles)
+k_0, k_1           = sch.split(loop=k, factors=k_tiles)
+# Organize the loops into “SSRSRS” 6-level tiles
+sch.reorder(
+    i_0, j_0, # S
+    i_1, j_1, # S
+    k_0,      # R
+    i_2, j_2, # S
+    k_1,      # R
+    i_3, j_3, # S
+)
+```
+
+### 3.3. Composite Schedule
+
+Each schedule primitive handles only a very basic operation to transform the IR, for example, `split` only splits a loop into two. In the real world, the over-fine granularity of those primitives usually leads to repetitive and verbose scheduling code, as [mentioned](https://discuss.tvm.apache.org/t/rfc-tensorir-a-schedulable-ir-for-tvm/7872/43?u=junrushao1994) by developers in our community.
+
+To counter this challenge, we allow users to register “composite schedules” that analyze the IR, and apply a set of schedule primitives correspondingly. For instance, a composite schedule may inspect a TensorIR block and decide whether we should call `compute_inline` on it. The composite schedule may use sampling instructions to fill in undecided choices.
+
+Our system also ships with some built-in composite schedules, including:
+
+* Multi-level tiling
+* Inline pure spatial blocks
+* Parallelize & vectorize & unroll
+* Auto tensorize
+* …
+
+### 3.4. AutoScheduler-style Design Space Generation
+
+AutoScheduler (Ansor) generates schedule templates by applying their SearchRules to each stage. Meta schedule treats a search rule as a composite schedule, and applies each composite schedule to each block of TensorIR to generate the design space.
+
+### 3.5. Unifying manual schedule / AutoTVM / Ansor
+
+In this section, we show that the design space induced by TE manual schedule, AutoTVM and Ansor are all subsets of meta schedule, and meta schedule further allows mixing those three styles to search jointly.
+
+**Manual schedule**. The TE schedule is a special case of a meta schedule program, where there is no randomness introduced by sampling instructions. It is a single point in terms of design space.
+
+**AutoTVM (Template-based tuning)**. Writing one or more schedule functions in meta schedule, potentially with sampling instructions, is a natural representation of AutoTVM’s schedule templates (knobs). The PPL generates one or more traces as the design space to explore.
+
+**AutoScheduler (Ansor, Template-free tuning)**. As mentioned in the previous section, application of composite schedule rules generates the design space, which is equivalent to Ansor’s sketch generation.
+
+**Mixing styles in design space definition**. By taking union of the spaces induced by the three special cases, our system allows developers to combine generic rules that Ansor provides and operator-specific scheduling.
+
+## 4. Reference-level explanation
+
+In this section, we introduce the underlying techniques for the automation system to extract and explore the design space.
+
+### 4.1. Execution trace as the design space
+
+**Trace**. To represent the design space defined by the meta schedule DSL, the underlying system records all the instructions users applied to the schedule class, including sampling and schedule primitives. We call this list of instructions a trace.
+
+Executing the example above results in the following trace:
+
+```
+Instruction 0. Sample-Perfect-Tile
+Instruction 1. Sample-Perfect-Tile
+Instruction 2. Sample-Perfect-Tile
+Instruction 3. Split
+Instruction 4. Split
+Instruction 5. Split
+Instruction 6. Reorder
+```
+
+The trace is not directly user-facing, but a data structure inside the user-facing `Schedule` class that records the execution. The automation system extracts the trace and finds out the design space according to the sampling instructions.
+
+**Union of traces**. Often a single trace is unable to represent the entire space. Therefore, more precisely, our system works on a list of traces as the union of potential design space.
+
+**Fork a trace**. When two different decisions in the scheduling process are equally important to generate high-performance schedules, we allow forking the trace into two, and the design space is the union of the forked traces.
+
+### 4.2. Exploring the Search Space
+
+Meta Schedule provides several built-in exploration strategies to exhaustively or efficiently search for efficient schedules.
+
+**Program replay**. A simple strategy that replays the schedule program that generates the PPL, and doesn’t use any advantage provided by the PPL.

Review comment:
       I see. Then it's better to clarify it here and differentiate replay and random search.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tvm-rfcs] junrushao1994 commented on a change in pull request #5: [RFC] Meta Schedule (AutoTIR)

Posted by GitBox <gi...@apache.org>.
junrushao1994 commented on a change in pull request #5:
URL: https://github.com/apache/tvm-rfcs/pull/5#discussion_r676875856



##########
File path: rfcs/0001-meta-schedule-autotensorir.md
##########
@@ -147,17 +203,20 @@ sch.reorder(
 
 ### 3.4. AutoScheduler-style Design Space Generation
 
-AutoScheduler (Ansor) generates schedule templates by applying their SearchRules to each stage.
-SearchRule analyzes TE and eagerly trigger schedule primitives accordingly in its internally
-maintained mini IR.
+To generate design space, AutoScheduler (Ansor) applies a set of rules to each TE stage.
+The rules analyze the TE operations and apply an internal DSL to manipulating its internal IR,
+which is in the end mapped to TE schedule primitives. This process is called *sketch generation*.
 
-As introduced in Section 3.2, composite schedule rules are equivalent to AutoScheduler's SearchRule
-in TensorIR scheduling. To further generate a design space for scheduling, sampling instructions are
-used in composite schedule rules. Similarly, the sketch generation phase in AutoScheduler is
-equivalent to applying composite schedule rules to each block in TensorIR.
+Composite schedule rules work in a similar way scheduling TensorIR, as introduced in Section 3.2.
+It analyzes the TensorIR and apply schedule primitives directly to TensorIR accordingly.
+When applying such rules to each TensorIR block in certain order (e.g. Post-DFS Order),

Review comment:
       It is rare but possible that developers write their own order, but I didn't see any meaningful usecases to change order though. So in other words, we probably can post-DFS order is a provided builtin, but allows further extension




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tvm-rfcs] junrushao1994 commented on pull request #5: [RFC] Meta Schedule (AutoTIR)

Posted by GitBox <gi...@apache.org>.
junrushao1994 commented on pull request #5:
URL: https://github.com/apache/tvm-rfcs/pull/5#issuecomment-871602430


   Sorry for the delay! I am getting back on this RFC and will be working on it in the rest of this week


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tvm-rfcs] junrushao1994 commented on a change in pull request #5: [RFC] Meta Schedule (AutoTensorIR)

Posted by GitBox <gi...@apache.org>.
junrushao1994 commented on a change in pull request #5:
URL: https://github.com/apache/tvm-rfcs/pull/5#discussion_r647850828



##########
File path: rfcs/0001-meta-schedule-autotensorir.md
##########
@@ -0,0 +1,297 @@
+* Feature Name: Meta Schedule (AutoTensorIR)
+* Start Date: 2021-05-28
+* RFC PR: TBD (apache/tvm-rfcs#0000)
+* GitHub Issue: TBD (apache/tvm-rfcs#0000)
+
+## 1. Summary
+
+This proposal introduces Meta Schedule: a probabilistic scheduling DSL on TIR that unifies the approaches of AutoTVM and Auto Scheduler (Ansor). Meta schedule provides a pragmatic way to define the space of automatic tuning, extensibility in terms of all possible TIR schedule primitives like tensorization and loop partitioning, and customizability on every layer of the automation system.
+
+Meta Schedule is our 3rd generation automatic scheduling system.
+
+## 2. Motivation
+
+**Scheduling and Design Space**
+
+In TVM TensorIR, optimization of a TensorIR program is done via a sequence of transformations. For example, we reorder loops for better locality and we tensorize for specific hardware intrinsics. The process of invoking such a set of pre-defined transformations is called “**scheduling**”, and each transformation is called a “**schedule primitive**”. These primitives form a domain-specific language (DSL) describing the transformation of TensorIR programs. **Design space** is the set of all possible schedulings with respect to a TensorIR program.
+
+**Problems with the Current Scheduling System**
+
+* **Manual schedule**: Developers optimize their programs by manually invoking schedule primitives, i.e. explore points in the design space with humans in the loop. This can be a tedious and error-prone approach, hence the creation of AutoTVM and AutoScheduler (Ansor).
+* **AutoTVM**: The automation system requires users to define “schedule templates” as the design space for each operator. Therefore, it is inextensible to hundreds of operators.
+* **AutoScheduler (Ansor)**: It automatically generates schedule templates as the design space, according to a set of predefined “search rules”. However, it is non-trivial to extend AutoScheduler to new schedule primitives (tensorize, loop partition, software pipelining).
+* The three systems above have isolated sets of APIs with several layers of their own abstraction, which are not only hard to learn, but also engineering-intensive to customize.
+
+**Benefit of Meta Schedule**
+
+* Succinct syntax, consistent APIs to TensorIR schedule with no other layer of abstraction.
+* Provides unified APIs for implementing manual schedule, AutoTVM and AutoScheduler (Ansor).
+* Extensibility to all the schedule primitives, including tensorization and loop partitioning. Almost no extra effort is needed to use a new primitive in auto-tuning.
+* The automation infrastructure is customizable across every layer.
+
+## 3. Guide-level explanation
+
+In this section, we describe the syntax of meta schedule DSL, and how it could be used to describe and auto-generate the design space.
+
+### 3.1. Manual Schedule
+
+Meta schedule APIs are almost the same as TE or TensorIR scheduling. Here is an example of a manual schedule for matrix multiplication:
+
+```python
+# Designate a set of tile sizes
+i_tiles = [16, 8, 8, 8]
+j_tiles = [16, 8, 8, 8]
+k_tiles = [256, 8]
+
+# Tile the loops according to the tile sizes
+i_0, i_1, i_2, i_3 = sch.split(loop=i, factors=i_tiles)
+j_0, j_1, j_2, j_3 = sch.split(loop=j, factors=j_tiles)
+k_0, k_1           = sch.split(loop=k, factors=k_tiles)
+
+# Organize the loops into “SSRSRS” 6-level tiles
+sch.reorder(
+    i_0, j_0, # S
+    i_1, j_1, # S
+    k_0,      # R
+    i_2, j_2, # S
+    k_1,      # R
+    i_3, j_3, # S
+)
+```
+
+In this example, the developers may tweak the tile sizes and measure the performance of the generated kernels to explore the opportunities of potential optimization.
+
+Generally speaking, while writing a schedule, there are often some parameters that are hard to determine ahead of time, for example, tile sizes, unroll steps, or which tensor intrinsics to use. Developers may manually enumerate possible combinations of these unknown factors, and then pick the best schedule according to measurement results on their device.
+
+### 3.2. AutoTVM-style Design Space Description
+
+Meta schedule extends the schedule DSL with sampling instructions. When included in a schedule, these instructions parametrize the schedule from a single deterministic point to a space supported by random variables (tile size, etc.), making it possible for developers to describe the design space with meta schedule APIs.
+
+We can extend the matmul example above to cover all possible tilings using these sampling instructions:
+
+```python
+# Sample tile sizes
+i_tiles = sch.sample_perfect_tile(i, n=4)
+j_tiles = sch.sample_perfect_tile(j, n=4)
+k_tiles = sch.sample_perfect_tile(k, n=2)
+# Tile the loops according to the random variables
+i_0, i_1, i_2, i_3 = sch.split(loop=i, factors=i_tiles)
+j_0, j_1, j_2, j_3 = sch.split(loop=j, factors=j_tiles)
+k_0, k_1           = sch.split(loop=k, factors=k_tiles)
+# Organize the loops into “SSRSRS” 6-level tiles
+sch.reorder(
+    i_0, j_0, # S
+    i_1, j_1, # S
+    k_0,      # R
+    i_2, j_2, # S
+    k_1,      # R
+    i_3, j_3, # S
+)
+```
+
+### 3.3. Composite Schedule
+
+Each schedule primitive handles only a very basic operation to transform the IR, for example, `split` only splits a loop into two. In the real world, the over-fine granularity of those primitives usually leads to repetitive and verbose scheduling code, as [mentioned](https://discuss.tvm.apache.org/t/rfc-tensorir-a-schedulable-ir-for-tvm/7872/43?u=junrushao1994) by developers in our community.
+
+To counter this challenge, we allow users to register “composite schedules” that analyze the IR, and apply a set of schedule primitives correspondingly. For instance, a composite schedule may inspect a TensorIR block and decide whether we should call `compute_inline` on it. The composite schedule may use sampling instructions to fill in undecided choices.
+
+Our system also ships with some built-in composite schedules, including:
+
+* Multi-level tiling
+* Inline pure spatial blocks
+* Parallelize & vectorize & unroll
+* Auto tensorize
+* …
+
+### 3.4. AutoScheduler-style Design Space Generation
+
+AutoScheduler (Ansor) generates schedule templates by applying their SearchRules to each stage. Meta schedule treats a search rule as a composite schedule, and applies each composite schedule to each block of TensorIR to generate the design space.
+
+### 3.5. Unifying manual schedule / AutoTVM / Ansor
+
+In this section, we show that the design space induced by TE manual schedule, AutoTVM and Ansor are all subsets of meta schedule, and meta schedule further allows mixing those three styles to search jointly.
+
+**Manual schedule**. The TE schedule is a special case of a meta schedule program, where there is no randomness introduced by sampling instructions. It is a single point in terms of design space.
+
+**AutoTVM (Template-based tuning)**. Writing one or more schedule functions in meta schedule, potentially with sampling instructions, is a natural representation of AutoTVM’s schedule templates (knobs). The PPL generates one or more traces as the design space to explore.
+
+**AutoScheduler (Ansor, Template-free tuning)**. As mentioned in the previous section, application of composite schedule rules generates the design space, which is equivalent to Ansor’s sketch generation.
+
+**Mixing styles in design space definition**. By taking union of the spaces induced by the three special cases, our system allows developers to combine generic rules that Ansor provides and operator-specific scheduling.
+
+## 4. Reference-level explanation
+
+In this section, we introduce the underlying techniques for the automation system to extract and explore the design space.
+
+### 4.1. Execution trace as the design space
+
+**Trace**. To represent the design space defined by the meta schedule DSL, the underlying system records all the instructions users applied to the schedule class, including sampling and schedule primitives. We call this list of instructions a trace.
+
+Executing the example above results in the following trace:
+
+```
+Instruction 0. Sample-Perfect-Tile
+Instruction 1. Sample-Perfect-Tile
+Instruction 2. Sample-Perfect-Tile
+Instruction 3. Split
+Instruction 4. Split
+Instruction 5. Split
+Instruction 6. Reorder
+```
+
+The trace is not directly user-facing, but a data structure inside the user-facing `Schedule` class that records the execution. The automation system extracts the trace and finds out the design space according to the sampling instructions.
+
+**Union of traces**. Often a single trace is unable to represent the entire space. Therefore, more precisely, our system works on a list of traces as the union of potential design space.
+
+**Fork a trace**. When two different decisions in the scheduling process are equally important to generate high-performance schedules, we allow forking the trace into two, and the design space is the union of the forked traces.
+
+### 4.2. Exploring the Search Space
+
+Meta Schedule provides several built-in exploration strategies to exhaustively or efficiently search for efficient schedules.
+
+**Program replay**. A simple strategy that replays the schedule program that generates the PPL, and doesn’t use any advantage provided by the PPL.
+
+**Random search**. Extracts the PPL, and repetitively re-executes the PPL by flipping coins purely randomly.
+
+**Cost-model-guided evolutionary search**. A more efficient exploration strategy. We define two sets of rules:
+
+* Mutator: defines how to jump to a point’s “neighbor” in the design space
+* Postprocessor: sometimes it is non-trivial to statically determine the PPL, for example:
+  * There is a hard requirement in CUDA that the maximum number of threads should not exceed 1024, but it is a random variable that cannot be determined before actually executing the PPL. In this case, we write a postprocessor that errors out when the condition is not satisfied.

Review comment:
       Yes. I mentioned it in Section 8 "supporting control flow and assertions"




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tvm-rfcs] junrushao1994 commented on a change in pull request #5: [RFC] Meta Schedule (AutoTIR)

Posted by GitBox <gi...@apache.org>.
junrushao1994 commented on a change in pull request #5:
URL: https://github.com/apache/tvm-rfcs/pull/5#discussion_r661789516



##########
File path: rfcs/0001-meta-schedule-autotensorir.md
##########
@@ -0,0 +1,297 @@
+* Feature Name: Meta Schedule (AutoTensorIR)
+* Start Date: 2021-05-28
+* RFC PR: TBD (apache/tvm-rfcs#0000)
+* GitHub Issue: TBD (apache/tvm-rfcs#0000)
+
+## 1. Summary
+
+This proposal introduces Meta Schedule: a probabilistic scheduling DSL on TIR that unifies the approaches of AutoTVM and Auto Scheduler (Ansor). Meta schedule provides a pragmatic way to define the space of automatic tuning, extensibility in terms of all possible TIR schedule primitives like tensorization and loop partitioning, and customizability on every layer of the automation system.
+
+Meta Schedule is our 3rd generation automatic scheduling system.
+
+## 2. Motivation
+
+**Scheduling and Design Space**
+
+In TVM TensorIR, optimization of a TensorIR program is done via a sequence of transformations. For example, we reorder loops for better locality and we tensorize for specific hardware intrinsics. The process of invoking such a set of pre-defined transformations is called “**scheduling**”, and each transformation is called a “**schedule primitive**”. These primitives form a domain-specific language (DSL) describing the transformation of TensorIR programs. **Design space** is the set of all possible schedulings with respect to a TensorIR program.
+
+**Problems with the Current Scheduling System**
+
+* **Manual schedule**: Developers optimize their programs by manually invoking schedule primitives, i.e. explore points in the design space with humans in the loop. This can be a tedious and error-prone approach, hence the creation of AutoTVM and AutoScheduler (Ansor).
+* **AutoTVM**: The automation system requires users to define “schedule templates” as the design space for each operator. Therefore, it is inextensible to hundreds of operators.
+* **AutoScheduler (Ansor)**: It automatically generates schedule templates as the design space, according to a set of predefined “search rules”. However, it is non-trivial to extend AutoScheduler to new schedule primitives (tensorize, loop partition, software pipelining).
+* The three systems above have isolated sets of APIs with several layers of their own abstraction, which are not only hard to learn, but also engineering-intensive to customize.
+
+**Benefit of Meta Schedule**
+
+* Succinct syntax, consistent APIs to TensorIR schedule with no other layer of abstraction.
+* Provides unified APIs for implementing manual schedule, AutoTVM and AutoScheduler (Ansor).
+* Extensibility to all the schedule primitives, including tensorization and loop partitioning. Almost no extra effort is needed to use a new primitive in auto-tuning.
+* The automation infrastructure is customizable across every layer.
+
+## 3. Guide-level explanation
+
+In this section, we describe the syntax of meta schedule DSL, and how it could be used to describe and auto-generate the design space.
+
+### 3.1. Manual Schedule
+
+Meta schedule APIs are almost the same as TE or TensorIR scheduling. Here is an example of a manual schedule for matrix multiplication:
+
+```python
+# Designate a set of tile sizes
+i_tiles = [16, 8, 8, 8]
+j_tiles = [16, 8, 8, 8]
+k_tiles = [256, 8]
+
+# Tile the loops according to the tile sizes
+i_0, i_1, i_2, i_3 = sch.split(loop=i, factors=i_tiles)
+j_0, j_1, j_2, j_3 = sch.split(loop=j, factors=j_tiles)
+k_0, k_1           = sch.split(loop=k, factors=k_tiles)
+
+# Organize the loops into “SSRSRS” 6-level tiles
+sch.reorder(
+    i_0, j_0, # S
+    i_1, j_1, # S
+    k_0,      # R
+    i_2, j_2, # S
+    k_1,      # R
+    i_3, j_3, # S
+)
+```
+
+In this example, the developers may tweak the tile sizes and measure the performance of the generated kernels to explore the opportunities of potential optimization.
+
+Generally speaking, while writing a schedule, there are often some parameters that are hard to determine ahead of time, for example, tile sizes, unroll steps, or which tensor intrinsics to use. Developers may manually enumerate possible combinations of these unknown factors, and then pick the best schedule according to measurement results on their device.
+
+### 3.2. AutoTVM-style Design Space Description
+
+Meta schedule extends the schedule DSL with sampling instructions. When included in a schedule, these instructions parametrize the schedule from a single deterministic point to a space supported by random variables (tile size, etc.), making it possible for developers to describe the design space with meta schedule APIs.
+
+We can extend the matmul example above to cover all possible tilings using these sampling instructions:
+
+```python
+# Sample tile sizes
+i_tiles = sch.sample_perfect_tile(i, n=4)
+j_tiles = sch.sample_perfect_tile(j, n=4)
+k_tiles = sch.sample_perfect_tile(k, n=2)
+# Tile the loops according to the random variables
+i_0, i_1, i_2, i_3 = sch.split(loop=i, factors=i_tiles)
+j_0, j_1, j_2, j_3 = sch.split(loop=j, factors=j_tiles)
+k_0, k_1           = sch.split(loop=k, factors=k_tiles)
+# Organize the loops into “SSRSRS” 6-level tiles
+sch.reorder(
+    i_0, j_0, # S
+    i_1, j_1, # S
+    k_0,      # R
+    i_2, j_2, # S
+    k_1,      # R
+    i_3, j_3, # S
+)
+```
+
+### 3.3. Composite Schedule
+
+Each schedule primitive handles only a very basic operation to transform the IR, for example, `split` only splits a loop into two. In the real world, the over-fine granularity of those primitives usually leads to repetitive and verbose scheduling code, as [mentioned](https://discuss.tvm.apache.org/t/rfc-tensorir-a-schedulable-ir-for-tvm/7872/43?u=junrushao1994) by developers in our community.
+
+To counter this challenge, we allow users to register “composite schedules” that analyze the IR, and apply a set of schedule primitives correspondingly. For instance, a composite schedule may inspect a TensorIR block and decide whether we should call `compute_inline` on it. The composite schedule may use sampling instructions to fill in undecided choices.
+
+Our system also ships with some built-in composite schedules, including:
+
+* Multi-level tiling
+* Inline pure spatial blocks
+* Parallelize & vectorize & unroll
+* Auto tensorize
+* …
+
+### 3.4. AutoScheduler-style Design Space Generation
+
+AutoScheduler (Ansor) generates schedule templates by applying their SearchRules to each stage. Meta schedule treats a search rule as a composite schedule, and applies each composite schedule to each block of TensorIR to generate the design space.
+
+### 3.5. Unifying manual schedule / AutoTVM / Ansor
+
+In this section, we show that the design space induced by TE manual schedule, AutoTVM and Ansor are all subsets of meta schedule, and meta schedule further allows mixing those three styles to search jointly.
+
+**Manual schedule**. The TE schedule is a special case of a meta schedule program, where there is no randomness introduced by sampling instructions. It is a single point in terms of design space.
+
+**AutoTVM (Template-based tuning)**. Writing one or more schedule functions in meta schedule, potentially with sampling instructions, is a natural representation of AutoTVM’s schedule templates (knobs). The PPL generates one or more traces as the design space to explore.
+
+**AutoScheduler (Ansor, Template-free tuning)**. As mentioned in the previous section, application of composite schedule rules generates the design space, which is equivalent to Ansor’s sketch generation.
+
+**Mixing styles in design space definition**. By taking union of the spaces induced by the three special cases, our system allows developers to combine generic rules that Ansor provides and operator-specific scheduling.
+
+## 4. Reference-level explanation
+
+In this section, we introduce the underlying techniques for the automation system to extract and explore the design space.
+
+### 4.1. Execution trace as the design space
+
+**Trace**. To represent the design space defined by the meta schedule DSL, the underlying system records all the instructions users applied to the schedule class, including sampling and schedule primitives. We call this list of instructions a trace.
+
+Executing the example above results in the following trace:
+
+```
+Instruction 0. Sample-Perfect-Tile
+Instruction 1. Sample-Perfect-Tile
+Instruction 2. Sample-Perfect-Tile
+Instruction 3. Split
+Instruction 4. Split
+Instruction 5. Split
+Instruction 6. Reorder
+```
+
+The trace is not directly user-facing, but a data structure inside the user-facing `Schedule` class that records the execution. The automation system extracts the trace and finds out the design space according to the sampling instructions.
+
+**Union of traces**. Often a single trace is unable to represent the entire space. Therefore, more precisely, our system works on a list of traces as the union of potential design space.
+
+**Fork a trace**. When two different decisions in the scheduling process are equally important to generate high-performance schedules, we allow forking the trace into two, and the design space is the union of the forked traces.
+
+### 4.2. Exploring the Search Space
+
+Meta Schedule provides several built-in exploration strategies to exhaustively or efficiently search for efficient schedules.
+
+**Program replay**. A simple strategy that replays the schedule program that generates the PPL, and doesn’t use any advantage provided by the PPL.
+
+**Random search**. Extracts the PPL, and repetitively re-executes the PPL by flipping coins purely randomly.
+
+**Cost-model-guided evolutionary search**. A more efficient exploration strategy. We define two sets of rules:
+
+* Mutator: defines how to jump to a point’s “neighbor” in the design space
+* Postprocessor: sometimes it is non-trivial to statically determine the PPL, for example:
+  * There is a hard requirement in CUDA that the maximum number of threads should not exceed 1024, but it is a random variable that cannot be determined before actually executing the PPL. In this case, we write a postprocessor that errors out when the condition is not satisfied.
+  * The number of outer loops to be fused together depends on their extents, which are random variables. In this case, we annotate the maximum extent allowed on the block, and do actual fusion in a postprocessor.
+
+Our evolutionary search algorithm uses mutators to find possible schedules in the design space, then applies postprocessors, and asks the cost model to predict its performance. After several iterations, the new schedules with the highest scores are finally compiled and measured on device. Epsilon-greedy is used in this process to balance exploitation and exploration.
+
+### 4.3. Python first for flexibility & customizability
+
+We implement the system in a way that all levels are decoupled and open to customization, aiming at providing a playground for developers to try out new ideas and potentially deliver performance quickly.
+
+While all the important APIs are implemented in C++ for efficiency, every part of the system can be easily switched to customized python implementation. For example,
+
+**Customize design space in python**. Can be a python function that does the schedule
+
+```python
+def schedule_matmul(sch) -> sch:
+    i, j, k = sch.get_loops(sch.get_block(“matmul”))
+    i_tiles = sch.sample_perfect_tile(i, n=4)
+    j_tiles = sch.sample_perfect_tile(j, n=4)
+    k_tiles = sch.sample_perfect_tile(k, n=2)
+    # Tile the loops according to the random variables
+    i_0, i_1, i_2, i_3 = sch.split(loop=i, factors=i_tiles)
+    j_0, j_1, j_2, j_3 = sch.split(loop=j, factors=j_tiles)
+    k_0, k_1 = sch.split(loop=k, factors=k_tiles)
+    # Organize the loops into “SSRSRS” 6-level tiles
+    sch.reorder(
+        i_0, j_0, # S
+        i_1, j_1, # S
+        k_0,      # R
+        i_2, j_2, # S
+        k_1,      # R
+        i_3, j_3, # S
+    )
+    return sch
+```
+
+**Customize composite schedule in python**. We provide two ways to define a composite schedule in python:
+
+Method 1. A simple decorator that converts a python function to a composite schedule
+
+```python
+@tir.as_composite_schedule(name="multi-level-tiling")
+def multi_level_tiling(sch: Schedule, block: BlockRV) -> Union[Schedule, List[Schedule]]:
+    ...
+```
+
+Method 2. Derive from `PyCompositeSchedule`, providing extra functionalities like initialization
+
+```python
+class MultiLevelTiling(PyCompositeSchedule):
+    def initialize(...):
+        ...
+    def apply(...):
+        ...

Review comment:
       The decorator is just syntactic sugar of constructing this class. I will update the doc to clarify




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tvm-rfcs] junrushao1994 commented on a change in pull request #5: [RFC] Meta Schedule (AutoTIR)

Posted by GitBox <gi...@apache.org>.
junrushao1994 commented on a change in pull request #5:
URL: https://github.com/apache/tvm-rfcs/pull/5#discussion_r676861423



##########
File path: rfcs/0001-meta-schedule-autotensorir.md
##########
@@ -107,29 +128,64 @@ best schedule according to measurement results on their device.
 As introduced in the previous section, in TensorIR, each schedule primitive handles only a very
 basic transformation of the IR. For example, `split` only splits a loop into two new loops. In the
 real world, the over-fine granularity of those primitives usually leads to repetitive and verbose
-scheduling code, as
-[mentioned](https://discuss.tvm.apache.org/t/rfc-tensorir-a-schedulable-ir-for-tvm/7872/43?u=junrushao1994)
-by developers in our community.
+scheduling code. Take the code snippet in the previous section as an example, a sequence of `split`s
+are invoked, followed by a `reorder`, and all these together are called "SSRSRS" tiling.
+
+To make it more convenient and modular, users are allowed to register **composite schedules** that apply
+a sequence of schedule primitives according to certain analysis of the IR. The word *composite* here
+is used against the word *primitive*, which means it is a transformation *composed* of those
+*primitives*.
+
+For example, suppose there is a composite schedule called `Inline-All-Elementwise-Operations`, which
+inlines all the elementwise computation into their consumers. Applying it to the following TensorIR:
+
+```python
+@tvm.script.tir
+def example_func(...):
+  for i, j in ...:
+    with tir.Block("B") ...:
+      B[i, j] = A[i, j] + 1
+  for i, j in ...:
+    with tir.Block("C") ...:
+      C[i, j] = B[i, j] + 1
+  for i, j in ...:
+    with tir.Block("D") ...:
+      D[i, j] = C[i, j] + 1
+
+sch = tir.Schedule(example_func)
+InlineAllElementwiseOperations().apply(sch, sch.get_block("D"))
+print(tvm.script.asscript(sch.mod))
+```
+
+The result after applying the composite schedule is:
 
-To make it more convenient and modular, we allow users to register "composite schedules" that apply
-a sequence of schedule primitives according to certain analysis of the IR. For instance, a composite
-schedule may inspect a TensorIR block and decide whether we should call `compute_inline` on it.
+```python
+@tvm.script.tir
+def example_func(...):
+  for i, j in ...:
+    with tir.Block("D") ...:
+      D[i, j] = A[i, j] + 1 + 1 + 1
+```
 
 ### 3.3. AutoTVM-style Design Space Description
 
-Meta schedule extends the schedule DSL with sampling instructions. When included in a schedule,
-these instructions parametrize the schedule from a single deterministic point to a space supported
-by random variables (tile size, etc.), making it possible for developers to describe the design
-space with meta schedule APIs.
+Meta schedule extends the schedule DSL with a set of new schedule primitives with randomness,
+called **sampling instructions**. These primitives do not transform the TensorIR,
+but instead will generate random decisions from specific distributions in each run,

Review comment:
       added:
   > in terms of tiling strategies, fusion levels, unroll lengths, etc.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tvm-rfcs] junrushao1994 commented on a change in pull request #5: [RFC] Meta Schedule (AutoTIR)

Posted by GitBox <gi...@apache.org>.
junrushao1994 commented on a change in pull request #5:
URL: https://github.com/apache/tvm-rfcs/pull/5#discussion_r676824762



##########
File path: rfcs/0001-meta-schedule-autotensorir.md
##########
@@ -147,17 +203,20 @@ sch.reorder(
 
 ### 3.4. AutoScheduler-style Design Space Generation
 
-AutoScheduler (Ansor) generates schedule templates by applying their SearchRules to each stage.
-SearchRule analyzes TE and eagerly trigger schedule primitives accordingly in its internally
-maintained mini IR.
+To generate design space, AutoScheduler (Ansor) applies a set of rules to each TE stage.

Review comment:
       A TE schedule is the schedule corresponding to a TE operation defined by `te.compute(...)`. Perhaps I should put a link to the doc of TE stage for reference: https://tvm.apache.org/docs/api/python/te.html#tvm.te.Stage.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tvm-rfcs] junrushao1994 commented on a change in pull request #5: [RFC] Meta Schedule (AutoTIR)

Posted by GitBox <gi...@apache.org>.
junrushao1994 commented on a change in pull request #5:
URL: https://github.com/apache/tvm-rfcs/pull/5#discussion_r661735327



##########
File path: rfcs/0001-meta-schedule-autotensorir.md
##########
@@ -0,0 +1,297 @@
+* Feature Name: Meta Schedule (AutoTensorIR)
+* Start Date: 2021-05-28
+* RFC PR: TBD (apache/tvm-rfcs#0000)
+* GitHub Issue: TBD (apache/tvm-rfcs#0000)
+
+## 1. Summary
+
+This proposal introduces Meta Schedule: a probabilistic scheduling DSL on TIR that unifies the approaches of AutoTVM and Auto Scheduler (Ansor). Meta schedule provides a pragmatic way to define the space of automatic tuning, extensibility in terms of all possible TIR schedule primitives like tensorization and loop partitioning, and customizability on every layer of the automation system.
+
+Meta Schedule is our 3rd generation automatic scheduling system.
+
+## 2. Motivation
+
+**Scheduling and Design Space**
+
+In TVM TensorIR, optimization of a TensorIR program is done via a sequence of transformations. For example, we reorder loops for better locality and we tensorize for specific hardware intrinsics. The process of invoking such a set of pre-defined transformations is called “**scheduling**”, and each transformation is called a “**schedule primitive**”. These primitives form a domain-specific language (DSL) describing the transformation of TensorIR programs. **Design space** is the set of all possible schedulings with respect to a TensorIR program.
+
+**Problems with the Current Scheduling System**
+
+* **Manual schedule**: Developers optimize their programs by manually invoking schedule primitives, i.e. explore points in the design space with humans in the loop. This can be a tedious and error-prone approach, hence the creation of AutoTVM and AutoScheduler (Ansor).
+* **AutoTVM**: The automation system requires users to define “schedule templates” as the design space for each operator. Therefore, it is inextensible to hundreds of operators.
+* **AutoScheduler (Ansor)**: It automatically generates schedule templates as the design space, according to a set of predefined “search rules”. However, it is non-trivial to extend AutoScheduler to new schedule primitives (tensorize, loop partition, software pipelining).
+* The three systems above have isolated sets of APIs with several layers of their own abstraction, which are not only hard to learn, but also engineering-intensive to customize.
+
+**Benefit of Meta Schedule**
+
+* Succinct syntax, consistent APIs to TensorIR schedule with no other layer of abstraction.
+* Provides unified APIs for implementing manual schedule, AutoTVM and AutoScheduler (Ansor).
+* Extensibility to all the schedule primitives, including tensorization and loop partitioning. Almost no extra effort is needed to use a new primitive in auto-tuning.
+* The automation infrastructure is customizable across every layer.
+
+## 3. Guide-level explanation
+
+In this section, we describe the syntax of meta schedule DSL, and how it could be used to describe and auto-generate the design space.
+
+### 3.1. Manual Schedule
+
+Meta schedule APIs are almost the same as TE or TensorIR scheduling. Here is an example of a manual schedule for matrix multiplication:
+
+```python
+# Designate a set of tile sizes
+i_tiles = [16, 8, 8, 8]
+j_tiles = [16, 8, 8, 8]
+k_tiles = [256, 8]
+
+# Tile the loops according to the tile sizes
+i_0, i_1, i_2, i_3 = sch.split(loop=i, factors=i_tiles)
+j_0, j_1, j_2, j_3 = sch.split(loop=j, factors=j_tiles)
+k_0, k_1           = sch.split(loop=k, factors=k_tiles)
+
+# Organize the loops into “SSRSRS” 6-level tiles
+sch.reorder(
+    i_0, j_0, # S
+    i_1, j_1, # S
+    k_0,      # R
+    i_2, j_2, # S
+    k_1,      # R
+    i_3, j_3, # S
+)
+```
+
+In this example, the developers may tweak the tile sizes and measure the performance of the generated kernels to explore the opportunities of potential optimization.
+
+Generally speaking, while writing a schedule, there are often some parameters that are hard to determine ahead of time, for example, tile sizes, unroll steps, or which tensor intrinsics to use. Developers may manually enumerate possible combinations of these unknown factors, and then pick the best schedule according to measurement results on their device.
+
+### 3.2. AutoTVM-style Design Space Description
+
+Meta schedule extends the schedule DSL with sampling instructions. When included in a schedule, these instructions parametrize the schedule from a single deterministic point to a space supported by random variables (tile size, etc.), making it possible for developers to describe the design space with meta schedule APIs.
+
+We can extend the matmul example above to cover all possible tilings using these sampling instructions:
+
+```python
+# Sample tile sizes
+i_tiles = sch.sample_perfect_tile(i, n=4)
+j_tiles = sch.sample_perfect_tile(j, n=4)
+k_tiles = sch.sample_perfect_tile(k, n=2)
+# Tile the loops according to the random variables
+i_0, i_1, i_2, i_3 = sch.split(loop=i, factors=i_tiles)
+j_0, j_1, j_2, j_3 = sch.split(loop=j, factors=j_tiles)
+k_0, k_1           = sch.split(loop=k, factors=k_tiles)
+# Organize the loops into “SSRSRS” 6-level tiles
+sch.reorder(
+    i_0, j_0, # S
+    i_1, j_1, # S
+    k_0,      # R
+    i_2, j_2, # S
+    k_1,      # R
+    i_3, j_3, # S
+)
+```
+
+### 3.3. Composite Schedule
+
+Each schedule primitive handles only a very basic operation to transform the IR, for example, `split` only splits a loop into two. In the real world, the over-fine granularity of those primitives usually leads to repetitive and verbose scheduling code, as [mentioned](https://discuss.tvm.apache.org/t/rfc-tensorir-a-schedulable-ir-for-tvm/7872/43?u=junrushao1994) by developers in our community.
+
+To counter this challenge, we allow users to register “composite schedules” that analyze the IR, and apply a set of schedule primitives correspondingly. For instance, a composite schedule may inspect a TensorIR block and decide whether we should call `compute_inline` on it. The composite schedule may use sampling instructions to fill in undecided choices.
+
+Our system also ships with some built-in composite schedules, including:
+
+* Multi-level tiling
+* Inline pure spatial blocks
+* Parallelize & vectorize & unroll
+* Auto tensorize
+* …
+
+### 3.4. AutoScheduler-style Design Space Generation
+
+AutoScheduler (Ansor) generates schedule templates by applying their SearchRules to each stage. Meta schedule treats a search rule as a composite schedule, and applies each composite schedule to each block of TensorIR to generate the design space.

Review comment:
       Adjusted accordingly. Thanks a lot!




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tvm-rfcs] comaniac commented on a change in pull request #5: [RFC] Meta Schedule (AutoTIR)

Posted by GitBox <gi...@apache.org>.
comaniac commented on a change in pull request #5:
URL: https://github.com/apache/tvm-rfcs/pull/5#discussion_r648490438



##########
File path: rfcs/0001-meta-schedule-autotensorir.md
##########
@@ -0,0 +1,297 @@
+* Feature Name: Meta Schedule (AutoTensorIR)
+* Start Date: 2021-05-28
+* RFC PR: TBD (apache/tvm-rfcs#0000)
+* GitHub Issue: TBD (apache/tvm-rfcs#0000)
+
+## 1. Summary
+
+This proposal introduces Meta Schedule: a probabilistic scheduling DSL on TIR that unifies the approaches of AutoTVM and Auto Scheduler (Ansor). Meta schedule provides a pragmatic way to define the space of automatic tuning, extensibility in terms of all possible TIR schedule primitives like tensorization and loop partitioning, and customizability on every layer of the automation system.
+
+Meta Schedule is our 3rd generation automatic scheduling system.
+
+## 2. Motivation
+
+**Scheduling and Design Space**
+
+In TVM TensorIR, optimization of a TensorIR program is done via a sequence of transformations. For example, we reorder loops for better locality and we tensorize for specific hardware intrinsics. The process of invoking such a set of pre-defined transformations is called “**scheduling**”, and each transformation is called a “**schedule primitive**”. These primitives form a domain-specific language (DSL) describing the transformation of TensorIR programs. **Design space** is the set of all possible schedulings with respect to a TensorIR program.
+
+**Problems with the Current Scheduling System**
+
+* **Manual schedule**: Developers optimize their programs by manually invoking schedule primitives, i.e. explore points in the design space with humans in the loop. This can be a tedious and error-prone approach, hence the creation of AutoTVM and AutoScheduler (Ansor).
+* **AutoTVM**: The automation system requires users to define “schedule templates” as the design space for each operator. Therefore, it is inextensible to hundreds of operators.
+* **AutoScheduler (Ansor)**: It automatically generates schedule templates as the design space, according to a set of predefined “search rules”. However, it is non-trivial to extend AutoScheduler to new schedule primitives (tensorize, loop partition, software pipelining).
+* The three systems above have isolated sets of APIs with several layers of their own abstraction, which are not only hard to learn, but also engineering-intensive to customize.
+
+**Benefit of Meta Schedule**
+
+* Succinct syntax, consistent APIs to TensorIR schedule with no other layer of abstraction.
+* Provides unified APIs for implementing manual schedule, AutoTVM and AutoScheduler (Ansor).
+* Extensibility to all the schedule primitives, including tensorization and loop partitioning. Almost no extra effort is needed to use a new primitive in auto-tuning.
+* The automation infrastructure is customizable across every layer.
+
+## 3. Guide-level explanation
+
+In this section, we describe the syntax of meta schedule DSL, and how it could be used to describe and auto-generate the design space.
+
+### 3.1. Manual Schedule
+
+Meta schedule APIs are almost the same as TE or TensorIR scheduling. Here is an example of a manual schedule for matrix multiplication:
+
+```python
+# Designate a set of tile sizes
+i_tiles = [16, 8, 8, 8]
+j_tiles = [16, 8, 8, 8]
+k_tiles = [256, 8]
+
+# Tile the loops according to the tile sizes
+i_0, i_1, i_2, i_3 = sch.split(loop=i, factors=i_tiles)
+j_0, j_1, j_2, j_3 = sch.split(loop=j, factors=j_tiles)
+k_0, k_1           = sch.split(loop=k, factors=k_tiles)
+
+# Organize the loops into “SSRSRS” 6-level tiles
+sch.reorder(
+    i_0, j_0, # S
+    i_1, j_1, # S
+    k_0,      # R
+    i_2, j_2, # S
+    k_1,      # R
+    i_3, j_3, # S
+)
+```
+
+In this example, the developers may tweak the tile sizes and measure the performance of the generated kernels to explore the opportunities of potential optimization.
+
+Generally speaking, while writing a schedule, there are often some parameters that are hard to determine ahead of time, for example, tile sizes, unroll steps, or which tensor intrinsics to use. Developers may manually enumerate possible combinations of these unknown factors, and then pick the best schedule according to measurement results on their device.
+
+### 3.2. AutoTVM-style Design Space Description
+
+Meta schedule extends the schedule DSL with sampling instructions. When included in a schedule, these instructions parametrize the schedule from a single deterministic point to a space supported by random variables (tile size, etc.), making it possible for developers to describe the design space with meta schedule APIs.
+
+We can extend the matmul example above to cover all possible tilings using these sampling instructions:
+
+```python
+# Sample tile sizes
+i_tiles = sch.sample_perfect_tile(i, n=4)
+j_tiles = sch.sample_perfect_tile(j, n=4)
+k_tiles = sch.sample_perfect_tile(k, n=2)
+# Tile the loops according to the random variables
+i_0, i_1, i_2, i_3 = sch.split(loop=i, factors=i_tiles)
+j_0, j_1, j_2, j_3 = sch.split(loop=j, factors=j_tiles)
+k_0, k_1           = sch.split(loop=k, factors=k_tiles)
+# Organize the loops into “SSRSRS” 6-level tiles
+sch.reorder(
+    i_0, j_0, # S
+    i_1, j_1, # S
+    k_0,      # R
+    i_2, j_2, # S
+    k_1,      # R
+    i_3, j_3, # S
+)
+```
+
+### 3.3. Composite Schedule
+
+Each schedule primitive handles only a very basic operation to transform the IR, for example, `split` only splits a loop into two. In the real world, the over-fine granularity of those primitives usually leads to repetitive and verbose scheduling code, as [mentioned](https://discuss.tvm.apache.org/t/rfc-tensorir-a-schedulable-ir-for-tvm/7872/43?u=junrushao1994) by developers in our community.
+
+To counter this challenge, we allow users to register “composite schedules” that analyze the IR, and apply a set of schedule primitives correspondingly. For instance, a composite schedule may inspect a TensorIR block and decide whether we should call `compute_inline` on it. The composite schedule may use sampling instructions to fill in undecided choices.
+
+Our system also ships with some built-in composite schedules, including:
+
+* Multi-level tiling
+* Inline pure spatial blocks
+* Parallelize & vectorize & unroll
+* Auto tensorize
+* …
+
+### 3.4. AutoScheduler-style Design Space Generation
+
+AutoScheduler (Ansor) generates schedule templates by applying their SearchRules to each stage. Meta schedule treats a search rule as a composite schedule, and applies each composite schedule to each block of TensorIR to generate the design space.
+
+### 3.5. Unifying manual schedule / AutoTVM / Ansor
+
+In this section, we show that the design space induced by TE manual schedule, AutoTVM and Ansor are all subsets of meta schedule, and meta schedule further allows mixing those three styles to search jointly.
+
+**Manual schedule**. The TE schedule is a special case of a meta schedule program, where there is no randomness introduced by sampling instructions. It is a single point in terms of design space.
+
+**AutoTVM (Template-based tuning)**. Writing one or more schedule functions in meta schedule, potentially with sampling instructions, is a natural representation of AutoTVM’s schedule templates (knobs). The PPL generates one or more traces as the design space to explore.
+
+**AutoScheduler (Ansor, Template-free tuning)**. As mentioned in the previous section, application of composite schedule rules generates the design space, which is equivalent to Ansor’s sketch generation.
+
+**Mixing styles in design space definition**. By taking union of the spaces induced by the three special cases, our system allows developers to combine generic rules that Ansor provides and operator-specific scheduling.
+
+## 4. Reference-level explanation
+
+In this section, we introduce the underlying techniques for the automation system to extract and explore the design space.
+
+### 4.1. Execution trace as the design space
+
+**Trace**. To represent the design space defined by the meta schedule DSL, the underlying system records all the instructions users applied to the schedule class, including sampling and schedule primitives. We call this list of instructions a trace.
+
+Executing the example above results in the following trace:
+
+```
+Instruction 0. Sample-Perfect-Tile
+Instruction 1. Sample-Perfect-Tile
+Instruction 2. Sample-Perfect-Tile
+Instruction 3. Split
+Instruction 4. Split
+Instruction 5. Split
+Instruction 6. Reorder
+```
+
+The trace is not directly user-facing, but a data structure inside the user-facing `Schedule` class that records the execution. The automation system extracts the trace and finds out the design space according to the sampling instructions.
+
+**Union of traces**. Often a single trace is unable to represent the entire space. Therefore, more precisely, our system works on a list of traces as the union of potential design space.

Review comment:
       I think the statement "each trace is a design space" would be confusing. If a trace can be analogized to a sketch, then the correct statement is "each trace forms a design space", meaning that we are able to generate a set of design points from a trace. If so, then "design space" is the right term here, and you could ignore the second paragraph.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tvm-rfcs] comaniac commented on a change in pull request #5: [RFC] Meta Schedule (AutoTIR)

Posted by GitBox <gi...@apache.org>.
comaniac commented on a change in pull request #5:
URL: https://github.com/apache/tvm-rfcs/pull/5#discussion_r648492658



##########
File path: rfcs/0001-meta-schedule-autotensorir.md
##########
@@ -0,0 +1,297 @@
+* Feature Name: Meta Schedule (AutoTensorIR)
+* Start Date: 2021-05-28
+* RFC PR: TBD (apache/tvm-rfcs#0000)
+* GitHub Issue: TBD (apache/tvm-rfcs#0000)
+
+## 1. Summary
+
+This proposal introduces Meta Schedule: a probabilistic scheduling DSL on TIR that unifies the approaches of AutoTVM and Auto Scheduler (Ansor). Meta schedule provides a pragmatic way to define the space of automatic tuning, extensibility in terms of all possible TIR schedule primitives like tensorization and loop partitioning, and customizability on every layer of the automation system.
+
+Meta Schedule is our 3rd generation automatic scheduling system.
+
+## 2. Motivation
+
+**Scheduling and Design Space**
+
+In TVM TensorIR, optimization of a TensorIR program is done via a sequence of transformations. For example, we reorder loops for better locality and we tensorize for specific hardware intrinsics. The process of invoking such a set of pre-defined transformations is called “**scheduling**”, and each transformation is called a “**schedule primitive**”. These primitives form a domain-specific language (DSL) describing the transformation of TensorIR programs. **Design space** is the set of all possible schedulings with respect to a TensorIR program.
+
+**Problems with the Current Scheduling System**
+
+* **Manual schedule**: Developers optimize their programs by manually invoking schedule primitives, i.e. explore points in the design space with humans in the loop. This can be a tedious and error-prone approach, hence the creation of AutoTVM and AutoScheduler (Ansor).
+* **AutoTVM**: The automation system requires users to define “schedule templates” as the design space for each operator. Therefore, it is inextensible to hundreds of operators.
+* **AutoScheduler (Ansor)**: It automatically generates schedule templates as the design space, according to a set of predefined “search rules”. However, it is non-trivial to extend AutoScheduler to new schedule primitives (tensorize, loop partition, software pipelining).
+* The three systems above have isolated sets of APIs with several layers of their own abstraction, which are not only hard to learn, but also engineering-intensive to customize.
+
+**Benefit of Meta Schedule**
+
+* Succinct syntax, consistent APIs to TensorIR schedule with no other layer of abstraction.
+* Provides unified APIs for implementing manual schedule, AutoTVM and AutoScheduler (Ansor).
+* Extensibility to all the schedule primitives, including tensorization and loop partitioning. Almost no extra effort is needed to use a new primitive in auto-tuning.
+* The automation infrastructure is customizable across every layer.
+
+## 3. Guide-level explanation
+
+In this section, we describe the syntax of meta schedule DSL, and how it could be used to describe and auto-generate the design space.
+
+### 3.1. Manual Schedule
+
+Meta schedule APIs are almost the same as TE or TensorIR scheduling. Here is an example of a manual schedule for matrix multiplication:
+
+```python
+# Designate a set of tile sizes
+i_tiles = [16, 8, 8, 8]
+j_tiles = [16, 8, 8, 8]
+k_tiles = [256, 8]
+
+# Tile the loops according to the tile sizes
+i_0, i_1, i_2, i_3 = sch.split(loop=i, factors=i_tiles)
+j_0, j_1, j_2, j_3 = sch.split(loop=j, factors=j_tiles)
+k_0, k_1           = sch.split(loop=k, factors=k_tiles)
+
+# Organize the loops into “SSRSRS” 6-level tiles
+sch.reorder(
+    i_0, j_0, # S
+    i_1, j_1, # S
+    k_0,      # R
+    i_2, j_2, # S
+    k_1,      # R
+    i_3, j_3, # S
+)
+```
+
+In this example, the developers may tweak the tile sizes and measure the performance of the generated kernels to explore the opportunities of potential optimization.
+
+Generally speaking, while writing a schedule, there are often some parameters that are hard to determine ahead of time, for example, tile sizes, unroll steps, or which tensor intrinsics to use. Developers may manually enumerate possible combinations of these unknown factors, and then pick the best schedule according to measurement results on their device.
+
+### 3.2. AutoTVM-style Design Space Description
+
+Meta schedule extends the schedule DSL with sampling instructions. When included in a schedule, these instructions parametrize the schedule from a single deterministic point to a space supported by random variables (tile size, etc.), making it possible for developers to describe the design space with meta schedule APIs.
+
+We can extend the matmul example above to cover all possible tilings using these sampling instructions:
+
+```python
+# Sample tile sizes
+i_tiles = sch.sample_perfect_tile(i, n=4)
+j_tiles = sch.sample_perfect_tile(j, n=4)
+k_tiles = sch.sample_perfect_tile(k, n=2)
+# Tile the loops according to the random variables
+i_0, i_1, i_2, i_3 = sch.split(loop=i, factors=i_tiles)
+j_0, j_1, j_2, j_3 = sch.split(loop=j, factors=j_tiles)
+k_0, k_1           = sch.split(loop=k, factors=k_tiles)
+# Organize the loops into “SSRSRS” 6-level tiles
+sch.reorder(
+    i_0, j_0, # S
+    i_1, j_1, # S
+    k_0,      # R
+    i_2, j_2, # S
+    k_1,      # R
+    i_3, j_3, # S
+)
+```
+
+### 3.3. Composite Schedule
+
+Each schedule primitive handles only a very basic operation to transform the IR, for example, `split` only splits a loop into two. In the real world, the over-fine granularity of those primitives usually leads to repetitive and verbose scheduling code, as [mentioned](https://discuss.tvm.apache.org/t/rfc-tensorir-a-schedulable-ir-for-tvm/7872/43?u=junrushao1994) by developers in our community.
+
+To counter this challenge, we allow users to register “composite schedules” that analyze the IR, and apply a set of schedule primitives correspondingly. For instance, a composite schedule may inspect a TensorIR block and decide whether we should call `compute_inline` on it. The composite schedule may use sampling instructions to fill in undecided choices.
+
+Our system also ships with some built-in composite schedules, including:
+
+* Multi-level tiling
+* Inline pure spatial blocks
+* Parallelize & vectorize & unroll
+* Auto tensorize
+* …
+
+### 3.4. AutoScheduler-style Design Space Generation
+
+AutoScheduler (Ansor) generates schedule templates by applying their SearchRules to each stage. Meta schedule treats a search rule as a composite schedule, and applies each composite schedule to each block of TensorIR to generate the design space.

Review comment:
       Well in this case I guess reorganize them to be "schedule" (3.1, 3.3) and "tunable schedule" (3.2, 3.4) might be clear.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tvm-rfcs] junrushao1994 commented on a change in pull request #5: [RFC] Meta Schedule (AutoTIR)

Posted by GitBox <gi...@apache.org>.
junrushao1994 commented on a change in pull request #5:
URL: https://github.com/apache/tvm-rfcs/pull/5#discussion_r668323104



##########
File path: rfcs/0001-meta-schedule-autotensorir.md
##########
@@ -0,0 +1,444 @@
+<!--- Licensed to the Apache Software Foundation (ASF) under one -->
+<!--- or more contributor license agreements.  See the NOTICE file -->
+<!--- distributed with this work for additional information -->
+<!--- regarding copyright ownership.  The ASF licenses this file -->
+<!--- to you under the Apache License, Version 2.0 (the -->
+<!--- "License"); you may not use this file except in compliance -->
+<!--- with the License.  You may obtain a copy of the License at -->
+
+<!---   http://www.apache.org/licenses/LICENSE-2.0 -->
+
+<!--- Unless required by applicable law or agreed to in writing, -->
+<!--- software distributed under the License is distributed on an -->
+<!--- "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY -->
+<!--- KIND, either express or implied.  See the License for the -->
+<!--- specific language governing permissions and limitations -->
+<!--- under the License. -->
+
+* Feature Name: Meta Schedule (Formerly AutoTIR)
+* Start Date: 2021-05-28
+* RFC PR: TBD (apache/tvm-rfcs#0000)
+* GitHub Issue: TBD (apache/tvm-rfcs#0000)
+
+## 1. Summary
+
+This proposal introduces Meta Schedule: a probabilistic scheduling DSL on TIR that unifies the
+approaches of AutoTVM and Auto Scheduler (Ansor). Meta schedule provides a pragmatic way to define
+the space of automatic tuning, extensibility in terms of all possible TIR schedule primitives like
+tensorization and loop partitioning, and customizability on every layer of the automation system.
+
+Meta Schedule is our 3rd generation automatic scheduling system.
+
+## 2. Motivation
+
+**Scheduling and design space.** In TVM TensorIR, optimization of a TensorIR program is done via a
+sequence of transformations. For example, we reorder loops for better locality and we tensorize for
+specific hardware intrinsics. The process of invoking such a set of pre-defined transformations is
+called "**scheduling**", and each transformation is called a "**schedule primitive**". These
+primitives form a domain-specific language (DSL) describing the transformation of TensorIR programs.
+**Design space** is the set of all possible schedulings with respect to a TensorIR program.
+
+**Problems with the current scheduling system.** Currently we have 3 sets of scheduling APIs:
+* **Manual schedule**: Developers optimize their programs by manually invoking schedule primitives,
+  i.e. explore points in the design space with humans in the loop. This can be a tedious and
+  error-prone approach, hence the creation of AutoTVM and AutoScheduler (Ansor).
+* **AutoTVM**: The automation system requires users to define "schedule templates" as the design
+  space for each operator. Therefore, it is inextensible to hundreds of operators and variety
+  hardware platforms.
+* **AutoScheduler (Ansor)**: It automatically generates schedule templates as the design space,
+  according to a set of predefined "search rules". However, it is non-trivial to extend
+  AutoScheduler to new schedule primitives (tensorize, loop partition, software pipelining, etc).
+* The three systems above have isolated sets of APIs with several layers of their own abstraction,
+  which are not only hard to learn, but also engineering-intensive to customize.
+
+**Benefits of Meta Schedule.**  Meta schedule provides:
+* Succinct syntax, consistent APIs to TensorIR schedule with no other layer of abstraction.
+* Unified APIs for implementing manual schedules, AutoTVM-style schedules, and AutoScheduler-style
+  schedules.
+* Extensibility of all the schedule primitives, including tensorization and loop partitioning.
+  Almost no extra effort is needed to use a new primitive in auto-tuning.
+* The automation infrastructure is extensible on every of its components. Every component of the
+  system can be customized easily in pure python or C++ or both. For example, one can develop a new
+  design space generator in python, a new ProgramRunner in python, etc.
+
+
+## 3. Guide-level explanation
+
+In this section, we describe the syntax of meta schedule DSL, and how it could be used to describe
+and auto-generate the design space.
+
+### 3.1. Manual Schedule
+
+Meta schedule APIs are almost the same as TE or TensorIR scheduling. Here is an example of a manual
+schedule for matrix multiplication:
+
+```python
+# Designate a set of tile sizes
+i_tiles = [16, 8, 8, 8]
+j_tiles = [16, 8, 8, 8]
+k_tiles = [256, 8]
+
+# Tile the loops according to the tile sizes
+i_0, i_1, i_2, i_3 = sch.split(loop=i, factors=i_tiles)
+j_0, j_1, j_2, j_3 = sch.split(loop=j, factors=j_tiles)
+k_0, k_1           = sch.split(loop=k, factors=k_tiles)
+
+# Organize the loops into "SSRSRS" 6-level tiles
+sch.reorder(
+    i_0, j_0, # S: the 1st spatial tile
+    i_1, j_1, # S: the 2nd spatial tile
+    k_0,      # R: the 1st reduction tile
+    i_2, j_2, # S: the 3rd spatial tile
+    k_1,      # R: the 2nd reduction tile
+    i_3, j_3, # S: the 4th spatial tile
+)
+```
+
+In this example, the developers may tweak the tile sizes and measure the performance of the
+generated kernels to explore the opportunities of potential optimization.
+
+Generally speaking, while writing a schedule, there are often some parameters that are hard to
+determine ahead of time, for example, tile sizes, unroll steps, or which tensor intrinsics to use.
+Developers may manually enumerate possible combinations of these unknown factors, and then pick the
+best schedule according to measurement results on their device.
+
+### 3.2. Composite Schedule Rules
+
+As introduced in the previous section, in TensorIR, each schedule primitive handles only a very
+basic transformation of the IR. For example, `split` only splits a loop into two new loops. In the
+real world, the over-fine granularity of those primitives usually leads to repetitive and verbose
+scheduling code, as
+[mentioned](https://discuss.tvm.apache.org/t/rfc-tensorir-a-schedulable-ir-for-tvm/7872/43?u=junrushao1994)
+by developers in our community.
+
+To make it more convenient and modular, we allow users to register "composite schedules" that apply
+a sequence of schedule primitives according to certain analysis of the IR. For instance, a composite
+schedule may inspect a TensorIR block and decide whether we should call `compute_inline` on it.
+
+### 3.3. AutoTVM-style Design Space Description
+
+Meta schedule extends the schedule DSL with sampling instructions. When included in a schedule,
+these instructions parametrize the schedule from a single deterministic point to a space supported
+by random variables (tile size, etc.), making it possible for developers to describe the design
+space with meta schedule APIs.
+
+We can extend the matmul example above to cover all possible tilings using these sampling
+instructions:
+
+```python
+# Sample tile sizes
+i_tiles = sch.sample_perfect_tile(i, n=4)
+j_tiles = sch.sample_perfect_tile(j, n=4)
+k_tiles = sch.sample_perfect_tile(k, n=2)
+# Tile the loops according to the random variables
+i_0, i_1, i_2, i_3 = sch.split(loop=i, factors=i_tiles)
+j_0, j_1, j_2, j_3 = sch.split(loop=j, factors=j_tiles)
+k_0, k_1           = sch.split(loop=k, factors=k_tiles)
+# Organize the loops into "SSRSRS" 6-level tiles
+sch.reorder(
+    i_0, j_0, # S: the 1st spatial tile
+    i_1, j_1, # S: the 2nd spatial tile
+    k_0,      # R: the 1st reduction tile
+    i_2, j_2, # S: the 3rd spatial tile
+    k_1,      # R: the 2nd reduction tile
+    i_3, j_3, # S: the 4th spatial tile
+)
+```
+
+### 3.4. AutoScheduler-style Design Space Generation
+
+AutoScheduler (Ansor) generates schedule templates by applying their SearchRules to each stage.
+SearchRule analyzes TE and eagerly trigger schedule primitives accordingly in its internally

Review comment:
       but the context here is somewhat not quite related to meta schedule itself (more relevant to TensorIR)




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tvm-rfcs] junrushao1994 edited a comment on pull request #5: [RFC] Meta Schedule (AutoTIR)

Posted by GitBox <gi...@apache.org>.
junrushao1994 edited a comment on pull request #5:
URL: https://github.com/apache/tvm-rfcs/pull/5#issuecomment-875121662


   @tkonolige thanks for the review!
   
   > The biggest one I can think of is the fact that we will have to rewrite all our existing schedules to take advantage of the new infrastructure.
   
   Given in most of the time we use sketch generation (in Ansor's terminology) to generate schedules automatically, we can just remove most of the schedules written in TE. Alternatively, we do need to rewrite all of the Ansor's sketch rules, including (defined in `src/auto_scheduler/search_policy/sketch_policy_rules.h`):
   - Always-Inline
   - Multi-Level-Tiling
   - Multi-Level-Tiling-with-Fusion
   - Add-Cache-Read
   - Add-Cache-Write
   - Add-RFactor
   - Simplify-Compute-with-Const-Tensor
   - Cross-Thread-Reduction
   - Special-Compute-Location
   
   Upstreaming those rules is part of upstreaming process, so I suppose it won't be a big problem :-)
   
   > Also, will tuning be slower if we allow users to define their own search rules?
   
   The search rule is only executed once to obtain the search space before we explore it, and it is usually fairly fast (within a second), so If we only customize our own search rule (i.e. sketch rule in Ansor, schedule rule in meta schedule), we won't observe performance degradation
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tvm-rfcs] areusch commented on a change in pull request #5: [RFC] Meta Schedule (AutoTIR)

Posted by GitBox <gi...@apache.org>.
areusch commented on a change in pull request #5:
URL: https://github.com/apache/tvm-rfcs/pull/5#discussion_r665727475



##########
File path: rfcs/0001-meta-schedule-autotensorir.md
##########
@@ -0,0 +1,444 @@
+<!--- Licensed to the Apache Software Foundation (ASF) under one -->
+<!--- or more contributor license agreements.  See the NOTICE file -->
+<!--- distributed with this work for additional information -->
+<!--- regarding copyright ownership.  The ASF licenses this file -->
+<!--- to you under the Apache License, Version 2.0 (the -->
+<!--- "License"); you may not use this file except in compliance -->
+<!--- with the License.  You may obtain a copy of the License at -->
+
+<!---   http://www.apache.org/licenses/LICENSE-2.0 -->
+
+<!--- Unless required by applicable law or agreed to in writing, -->
+<!--- software distributed under the License is distributed on an -->
+<!--- "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY -->
+<!--- KIND, either express or implied.  See the License for the -->
+<!--- specific language governing permissions and limitations -->
+<!--- under the License. -->
+
+* Feature Name: Meta Schedule (Formerly AutoTIR)
+* Start Date: 2021-05-28
+* RFC PR: TBD (apache/tvm-rfcs#0000)
+* GitHub Issue: TBD (apache/tvm-rfcs#0000)
+
+## 1. Summary
+
+This proposal introduces Meta Schedule: a probabilistic scheduling DSL on TIR that unifies the
+approaches of AutoTVM and Auto Scheduler (Ansor). Meta schedule provides a pragmatic way to define
+the space of automatic tuning, extensibility in terms of all possible TIR schedule primitives like
+tensorization and loop partitioning, and customizability on every layer of the automation system.
+
+Meta Schedule is our 3rd generation automatic scheduling system.
+
+## 2. Motivation
+
+**Scheduling and design space.** In TVM TensorIR, optimization of a TensorIR program is done via a
+sequence of transformations. For example, we reorder loops for better locality and we tensorize for
+specific hardware intrinsics. The process of invoking such a set of pre-defined transformations is
+called "**scheduling**", and each transformation is called a "**schedule primitive**". These
+primitives form a domain-specific language (DSL) describing the transformation of TensorIR programs.
+**Design space** is the set of all possible schedulings with respect to a TensorIR program.
+
+**Problems with the current scheduling system.** Currently we have 3 sets of scheduling APIs:

Review comment:
       this should be a subheading, i think, but it renders as bold text. can you `### Problems with the current scheduling system

##########
File path: rfcs/0001-meta-schedule-autotensorir.md
##########
@@ -0,0 +1,444 @@
+<!--- Licensed to the Apache Software Foundation (ASF) under one -->
+<!--- or more contributor license agreements.  See the NOTICE file -->
+<!--- distributed with this work for additional information -->
+<!--- regarding copyright ownership.  The ASF licenses this file -->
+<!--- to you under the Apache License, Version 2.0 (the -->
+<!--- "License"); you may not use this file except in compliance -->
+<!--- with the License.  You may obtain a copy of the License at -->
+
+<!---   http://www.apache.org/licenses/LICENSE-2.0 -->
+
+<!--- Unless required by applicable law or agreed to in writing, -->
+<!--- software distributed under the License is distributed on an -->
+<!--- "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY -->
+<!--- KIND, either express or implied.  See the License for the -->
+<!--- specific language governing permissions and limitations -->
+<!--- under the License. -->
+
+* Feature Name: Meta Schedule (Formerly AutoTIR)
+* Start Date: 2021-05-28
+* RFC PR: TBD (apache/tvm-rfcs#0000)
+* GitHub Issue: TBD (apache/tvm-rfcs#0000)
+
+## 1. Summary
+
+This proposal introduces Meta Schedule: a probabilistic scheduling DSL on TIR that unifies the

Review comment:
       could you say more why it's "probabalistic"? is the DSL the probablistic thing or the scheduling algorithm? if the latter, perhaps we could say "a scheduling DSL for TIR that unifies the approaches of AutoTVM and Auto Scheduler (Ansor) to facilitate probablistic scheduling.

##########
File path: rfcs/0001-meta-schedule-autotensorir.md
##########
@@ -0,0 +1,444 @@
+<!--- Licensed to the Apache Software Foundation (ASF) under one -->
+<!--- or more contributor license agreements.  See the NOTICE file -->
+<!--- distributed with this work for additional information -->
+<!--- regarding copyright ownership.  The ASF licenses this file -->
+<!--- to you under the Apache License, Version 2.0 (the -->
+<!--- "License"); you may not use this file except in compliance -->
+<!--- with the License.  You may obtain a copy of the License at -->
+
+<!---   http://www.apache.org/licenses/LICENSE-2.0 -->
+
+<!--- Unless required by applicable law or agreed to in writing, -->
+<!--- software distributed under the License is distributed on an -->
+<!--- "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY -->
+<!--- KIND, either express or implied.  See the License for the -->
+<!--- specific language governing permissions and limitations -->
+<!--- under the License. -->
+
+* Feature Name: Meta Schedule (Formerly AutoTIR)
+* Start Date: 2021-05-28
+* RFC PR: TBD (apache/tvm-rfcs#0000)
+* GitHub Issue: TBD (apache/tvm-rfcs#0000)
+
+## 1. Summary
+
+This proposal introduces Meta Schedule: a probabilistic scheduling DSL on TIR that unifies the
+approaches of AutoTVM and Auto Scheduler (Ansor). Meta schedule provides a pragmatic way to define
+the space of automatic tuning, extensibility in terms of all possible TIR schedule primitives like
+tensorization and loop partitioning, and customizability on every layer of the automation system.
+
+Meta Schedule is our 3rd generation automatic scheduling system.
+
+## 2. Motivation
+
+**Scheduling and design space.** In TVM TensorIR, optimization of a TensorIR program is done via a
+sequence of transformations. For example, we reorder loops for better locality and we tensorize for
+specific hardware intrinsics. The process of invoking such a set of pre-defined transformations is
+called "**scheduling**", and each transformation is called a "**schedule primitive**". These
+primitives form a domain-specific language (DSL) describing the transformation of TensorIR programs.
+**Design space** is the set of all possible schedulings with respect to a TensorIR program.
+
+**Problems with the current scheduling system.** Currently we have 3 sets of scheduling APIs:
+* **Manual schedule**: Developers optimize their programs by manually invoking schedule primitives,
+  i.e. explore points in the design space with humans in the loop. This can be a tedious and
+  error-prone approach, hence the creation of AutoTVM and AutoScheduler (Ansor).
+* **AutoTVM**: The automation system requires users to define "schedule templates" as the design
+  space for each operator. Therefore, it is inextensible to hundreds of operators and variety
+  hardware platforms.
+* **AutoScheduler (Ansor)**: It automatically generates schedule templates as the design space,
+  according to a set of predefined "search rules". However, it is non-trivial to extend
+  AutoScheduler to new schedule primitives (tensorize, loop partition, software pipelining, etc).
+* The three systems above have isolated sets of APIs with several layers of their own abstraction,
+  which are not only hard to learn, but also engineering-intensive to customize.
+
+**Benefits of Meta Schedule.**  Meta schedule provides:
+* Succinct syntax, consistent APIs to TensorIR schedule with no other layer of abstraction.
+* Unified APIs for implementing manual schedules, AutoTVM-style schedules, and AutoScheduler-style
+  schedules.
+* Extensibility of all the schedule primitives, including tensorization and loop partitioning.
+  Almost no extra effort is needed to use a new primitive in auto-tuning.
+* The automation infrastructure is extensible on every of its components. Every component of the
+  system can be customized easily in pure python or C++ or both. For example, one can develop a new
+  design space generator in python, a new ProgramRunner in python, etc.
+
+
+## 3. Guide-level explanation
+
+In this section, we describe the syntax of meta schedule DSL, and how it could be used to describe
+and auto-generate the design space.
+
+### 3.1. Manual Schedule
+
+Meta schedule APIs are almost the same as TE or TensorIR scheduling. Here is an example of a manual
+schedule for matrix multiplication:
+
+```python
+# Designate a set of tile sizes
+i_tiles = [16, 8, 8, 8]
+j_tiles = [16, 8, 8, 8]
+k_tiles = [256, 8]
+
+# Tile the loops according to the tile sizes
+i_0, i_1, i_2, i_3 = sch.split(loop=i, factors=i_tiles)
+j_0, j_1, j_2, j_3 = sch.split(loop=j, factors=j_tiles)
+k_0, k_1           = sch.split(loop=k, factors=k_tiles)
+
+# Organize the loops into "SSRSRS" 6-level tiles
+sch.reorder(
+    i_0, j_0, # S: the 1st spatial tile
+    i_1, j_1, # S: the 2nd spatial tile
+    k_0,      # R: the 1st reduction tile
+    i_2, j_2, # S: the 3rd spatial tile
+    k_1,      # R: the 2nd reduction tile
+    i_3, j_3, # S: the 4th spatial tile
+)
+```
+
+In this example, the developers may tweak the tile sizes and measure the performance of the

Review comment:
       if you remove the word "may" here and in the sentence below, it'll read as prescriptive--e.g. "this is how we expect developers to work with Meta Schedule when doing manual scheduling." right now, it just suggests how someone _could_ interact with meta schedule, but the reader is looking for something firmer (e.g. what is _the_ way to use meta schedule for manual scheduling?).

##########
File path: rfcs/0001-meta-schedule-autotensorir.md
##########
@@ -0,0 +1,444 @@
+<!--- Licensed to the Apache Software Foundation (ASF) under one -->
+<!--- or more contributor license agreements.  See the NOTICE file -->
+<!--- distributed with this work for additional information -->
+<!--- regarding copyright ownership.  The ASF licenses this file -->
+<!--- to you under the Apache License, Version 2.0 (the -->
+<!--- "License"); you may not use this file except in compliance -->
+<!--- with the License.  You may obtain a copy of the License at -->
+
+<!---   http://www.apache.org/licenses/LICENSE-2.0 -->
+
+<!--- Unless required by applicable law or agreed to in writing, -->
+<!--- software distributed under the License is distributed on an -->
+<!--- "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY -->
+<!--- KIND, either express or implied.  See the License for the -->
+<!--- specific language governing permissions and limitations -->
+<!--- under the License. -->
+
+* Feature Name: Meta Schedule (Formerly AutoTIR)
+* Start Date: 2021-05-28
+* RFC PR: TBD (apache/tvm-rfcs#0000)
+* GitHub Issue: TBD (apache/tvm-rfcs#0000)
+
+## 1. Summary
+
+This proposal introduces Meta Schedule: a probabilistic scheduling DSL on TIR that unifies the
+approaches of AutoTVM and Auto Scheduler (Ansor). Meta schedule provides a pragmatic way to define
+the space of automatic tuning, extensibility in terms of all possible TIR schedule primitives like
+tensorization and loop partitioning, and customizability on every layer of the automation system.
+
+Meta Schedule is our 3rd generation automatic scheduling system.
+
+## 2. Motivation
+
+**Scheduling and design space.** In TVM TensorIR, optimization of a TensorIR program is done via a
+sequence of transformations. For example, we reorder loops for better locality and we tensorize for
+specific hardware intrinsics. The process of invoking such a set of pre-defined transformations is
+called "**scheduling**", and each transformation is called a "**schedule primitive**". These
+primitives form a domain-specific language (DSL) describing the transformation of TensorIR programs.
+**Design space** is the set of all possible schedulings with respect to a TensorIR program.
+
+**Problems with the current scheduling system.** Currently we have 3 sets of scheduling APIs:
+* **Manual schedule**: Developers optimize their programs by manually invoking schedule primitives,
+  i.e. explore points in the design space with humans in the loop. This can be a tedious and
+  error-prone approach, hence the creation of AutoTVM and AutoScheduler (Ansor).
+* **AutoTVM**: The automation system requires users to define "schedule templates" as the design
+  space for each operator. Therefore, it is inextensible to hundreds of operators and variety
+  hardware platforms.
+* **AutoScheduler (Ansor)**: It automatically generates schedule templates as the design space,
+  according to a set of predefined "search rules". However, it is non-trivial to extend
+  AutoScheduler to new schedule primitives (tensorize, loop partition, software pipelining, etc).
+* The three systems above have isolated sets of APIs with several layers of their own abstraction,
+  which are not only hard to learn, but also engineering-intensive to customize.
+
+**Benefits of Meta Schedule.**  Meta schedule provides:
+* Succinct syntax, consistent APIs to TensorIR schedule with no other layer of abstraction.
+* Unified APIs for implementing manual schedules, AutoTVM-style schedules, and AutoScheduler-style
+  schedules.
+* Extensibility of all the schedule primitives, including tensorization and loop partitioning.
+  Almost no extra effort is needed to use a new primitive in auto-tuning.
+* The automation infrastructure is extensible on every of its components. Every component of the
+  system can be customized easily in pure python or C++ or both. For example, one can develop a new
+  design space generator in python, a new ProgramRunner in python, etc.
+
+
+## 3. Guide-level explanation
+
+In this section, we describe the syntax of meta schedule DSL, and how it could be used to describe
+and auto-generate the design space.
+
+### 3.1. Manual Schedule
+
+Meta schedule APIs are almost the same as TE or TensorIR scheduling. Here is an example of a manual
+schedule for matrix multiplication:
+
+```python
+# Designate a set of tile sizes
+i_tiles = [16, 8, 8, 8]
+j_tiles = [16, 8, 8, 8]
+k_tiles = [256, 8]
+
+# Tile the loops according to the tile sizes
+i_0, i_1, i_2, i_3 = sch.split(loop=i, factors=i_tiles)
+j_0, j_1, j_2, j_3 = sch.split(loop=j, factors=j_tiles)
+k_0, k_1           = sch.split(loop=k, factors=k_tiles)
+
+# Organize the loops into "SSRSRS" 6-level tiles
+sch.reorder(
+    i_0, j_0, # S: the 1st spatial tile
+    i_1, j_1, # S: the 2nd spatial tile
+    k_0,      # R: the 1st reduction tile
+    i_2, j_2, # S: the 3rd spatial tile
+    k_1,      # R: the 2nd reduction tile
+    i_3, j_3, # S: the 4th spatial tile
+)
+```
+
+In this example, the developers may tweak the tile sizes and measure the performance of the
+generated kernels to explore the opportunities of potential optimization.
+
+Generally speaking, while writing a schedule, there are often some parameters that are hard to
+determine ahead of time, for example, tile sizes, unroll steps, or which tensor intrinsics to use.
+Developers may manually enumerate possible combinations of these unknown factors, and then pick the
+best schedule according to measurement results on their device.
+
+### 3.2. Composite Schedule Rules
+
+As introduced in the previous section, in TensorIR, each schedule primitive handles only a very
+basic transformation of the IR. For example, `split` only splits a loop into two new loops. In the
+real world, the over-fine granularity of those primitives usually leads to repetitive and verbose
+scheduling code, as
+[mentioned](https://discuss.tvm.apache.org/t/rfc-tensorir-a-schedulable-ir-for-tvm/7872/43?u=junrushao1994)
+by developers in our community.
+
+To make it more convenient and modular, we allow users to register "composite schedules" that apply
+a sequence of schedule primitives according to certain analysis of the IR. For instance, a composite
+schedule may inspect a TensorIR block and decide whether we should call `compute_inline` on it.
+
+### 3.3. AutoTVM-style Design Space Description
+
+Meta schedule extends the schedule DSL with sampling instructions. When included in a schedule,
+these instructions parametrize the schedule from a single deterministic point to a space supported
+by random variables (tile size, etc.), making it possible for developers to describe the design
+space with meta schedule APIs.
+
+We can extend the matmul example above to cover all possible tilings using these sampling
+instructions:
+
+```python
+# Sample tile sizes
+i_tiles = sch.sample_perfect_tile(i, n=4)
+j_tiles = sch.sample_perfect_tile(j, n=4)
+k_tiles = sch.sample_perfect_tile(k, n=2)
+# Tile the loops according to the random variables
+i_0, i_1, i_2, i_3 = sch.split(loop=i, factors=i_tiles)
+j_0, j_1, j_2, j_3 = sch.split(loop=j, factors=j_tiles)
+k_0, k_1           = sch.split(loop=k, factors=k_tiles)
+# Organize the loops into "SSRSRS" 6-level tiles
+sch.reorder(
+    i_0, j_0, # S: the 1st spatial tile
+    i_1, j_1, # S: the 2nd spatial tile
+    k_0,      # R: the 1st reduction tile
+    i_2, j_2, # S: the 3rd spatial tile
+    k_1,      # R: the 2nd reduction tile
+    i_3, j_3, # S: the 4th spatial tile
+)
+```
+
+### 3.4. AutoScheduler-style Design Space Generation
+
+AutoScheduler (Ansor) generates schedule templates by applying their SearchRules to each stage.
+SearchRule analyzes TE and eagerly trigger schedule primitives accordingly in its internally
+maintained mini IR.
+
+As introduced in Section 3.2, composite schedule rules are equivalent to AutoScheduler's SearchRule
+in TensorIR scheduling. To further generate a design space for scheduling, sampling instructions are

Review comment:
       are we not still scheduling TensorIR here? why do you include "in TensorIR scheduling" in the sentence? if we are not, i suggest explicitly stating this in the introductory sentence above e.g. "AutoScheduler (Ansor) schedules TE by ..."

##########
File path: rfcs/0001-meta-schedule-autotensorir.md
##########
@@ -0,0 +1,444 @@
+<!--- Licensed to the Apache Software Foundation (ASF) under one -->
+<!--- or more contributor license agreements.  See the NOTICE file -->
+<!--- distributed with this work for additional information -->
+<!--- regarding copyright ownership.  The ASF licenses this file -->
+<!--- to you under the Apache License, Version 2.0 (the -->
+<!--- "License"); you may not use this file except in compliance -->
+<!--- with the License.  You may obtain a copy of the License at -->
+
+<!---   http://www.apache.org/licenses/LICENSE-2.0 -->
+
+<!--- Unless required by applicable law or agreed to in writing, -->
+<!--- software distributed under the License is distributed on an -->
+<!--- "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY -->
+<!--- KIND, either express or implied.  See the License for the -->
+<!--- specific language governing permissions and limitations -->
+<!--- under the License. -->
+
+* Feature Name: Meta Schedule (Formerly AutoTIR)
+* Start Date: 2021-05-28
+* RFC PR: TBD (apache/tvm-rfcs#0000)
+* GitHub Issue: TBD (apache/tvm-rfcs#0000)
+
+## 1. Summary
+
+This proposal introduces Meta Schedule: a probabilistic scheduling DSL on TIR that unifies the
+approaches of AutoTVM and Auto Scheduler (Ansor). Meta schedule provides a pragmatic way to define
+the space of automatic tuning, extensibility in terms of all possible TIR schedule primitives like
+tensorization and loop partitioning, and customizability on every layer of the automation system.
+
+Meta Schedule is our 3rd generation automatic scheduling system.
+
+## 2. Motivation
+
+**Scheduling and design space.** In TVM TensorIR, optimization of a TensorIR program is done via a
+sequence of transformations. For example, we reorder loops for better locality and we tensorize for
+specific hardware intrinsics. The process of invoking such a set of pre-defined transformations is
+called "**scheduling**", and each transformation is called a "**schedule primitive**". These
+primitives form a domain-specific language (DSL) describing the transformation of TensorIR programs.
+**Design space** is the set of all possible schedulings with respect to a TensorIR program.
+
+**Problems with the current scheduling system.** Currently we have 3 sets of scheduling APIs:
+* **Manual schedule**: Developers optimize their programs by manually invoking schedule primitives,
+  i.e. explore points in the design space with humans in the loop. This can be a tedious and
+  error-prone approach, hence the creation of AutoTVM and AutoScheduler (Ansor).
+* **AutoTVM**: The automation system requires users to define "schedule templates" as the design
+  space for each operator. Therefore, it is inextensible to hundreds of operators and variety
+  hardware platforms.
+* **AutoScheduler (Ansor)**: It automatically generates schedule templates as the design space,
+  according to a set of predefined "search rules". However, it is non-trivial to extend
+  AutoScheduler to new schedule primitives (tensorize, loop partition, software pipelining, etc).
+* The three systems above have isolated sets of APIs with several layers of their own abstraction,
+  which are not only hard to learn, but also engineering-intensive to customize.
+
+**Benefits of Meta Schedule.**  Meta schedule provides:
+* Succinct syntax, consistent APIs to TensorIR schedule with no other layer of abstraction.
+* Unified APIs for implementing manual schedules, AutoTVM-style schedules, and AutoScheduler-style
+  schedules.
+* Extensibility of all the schedule primitives, including tensorization and loop partitioning.
+  Almost no extra effort is needed to use a new primitive in auto-tuning.
+* The automation infrastructure is extensible on every of its components. Every component of the
+  system can be customized easily in pure python or C++ or both. For example, one can develop a new
+  design space generator in python, a new ProgramRunner in python, etc.
+
+
+## 3. Guide-level explanation
+
+In this section, we describe the syntax of meta schedule DSL, and how it could be used to describe
+and auto-generate the design space.
+
+### 3.1. Manual Schedule
+
+Meta schedule APIs are almost the same as TE or TensorIR scheduling. Here is an example of a manual
+schedule for matrix multiplication:
+
+```python
+# Designate a set of tile sizes
+i_tiles = [16, 8, 8, 8]
+j_tiles = [16, 8, 8, 8]
+k_tiles = [256, 8]
+
+# Tile the loops according to the tile sizes
+i_0, i_1, i_2, i_3 = sch.split(loop=i, factors=i_tiles)
+j_0, j_1, j_2, j_3 = sch.split(loop=j, factors=j_tiles)
+k_0, k_1           = sch.split(loop=k, factors=k_tiles)
+
+# Organize the loops into "SSRSRS" 6-level tiles
+sch.reorder(
+    i_0, j_0, # S: the 1st spatial tile
+    i_1, j_1, # S: the 2nd spatial tile
+    k_0,      # R: the 1st reduction tile
+    i_2, j_2, # S: the 3rd spatial tile
+    k_1,      # R: the 2nd reduction tile
+    i_3, j_3, # S: the 4th spatial tile
+)
+```
+
+In this example, the developers may tweak the tile sizes and measure the performance of the
+generated kernels to explore the opportunities of potential optimization.
+
+Generally speaking, while writing a schedule, there are often some parameters that are hard to
+determine ahead of time, for example, tile sizes, unroll steps, or which tensor intrinsics to use.
+Developers may manually enumerate possible combinations of these unknown factors, and then pick the
+best schedule according to measurement results on their device.
+
+### 3.2. Composite Schedule Rules
+
+As introduced in the previous section, in TensorIR, each schedule primitive handles only a very
+basic transformation of the IR. For example, `split` only splits a loop into two new loops. In the
+real world, the over-fine granularity of those primitives usually leads to repetitive and verbose
+scheduling code, as
+[mentioned](https://discuss.tvm.apache.org/t/rfc-tensorir-a-schedulable-ir-for-tvm/7872/43?u=junrushao1994)
+by developers in our community.
+
+To make it more convenient and modular, we allow users to register "composite schedules" that apply
+a sequence of schedule primitives according to certain analysis of the IR. For instance, a composite
+schedule may inspect a TensorIR block and decide whether we should call `compute_inline` on it.
+
+### 3.3. AutoTVM-style Design Space Description
+
+Meta schedule extends the schedule DSL with sampling instructions. When included in a schedule,
+these instructions parametrize the schedule from a single deterministic point to a space supported
+by random variables (tile size, etc.), making it possible for developers to describe the design
+space with meta schedule APIs.
+
+We can extend the matmul example above to cover all possible tilings using these sampling
+instructions:
+
+```python
+# Sample tile sizes
+i_tiles = sch.sample_perfect_tile(i, n=4)
+j_tiles = sch.sample_perfect_tile(j, n=4)
+k_tiles = sch.sample_perfect_tile(k, n=2)
+# Tile the loops according to the random variables
+i_0, i_1, i_2, i_3 = sch.split(loop=i, factors=i_tiles)
+j_0, j_1, j_2, j_3 = sch.split(loop=j, factors=j_tiles)
+k_0, k_1           = sch.split(loop=k, factors=k_tiles)
+# Organize the loops into "SSRSRS" 6-level tiles
+sch.reorder(
+    i_0, j_0, # S: the 1st spatial tile
+    i_1, j_1, # S: the 2nd spatial tile
+    k_0,      # R: the 1st reduction tile
+    i_2, j_2, # S: the 3rd spatial tile
+    k_1,      # R: the 2nd reduction tile
+    i_3, j_3, # S: the 4th spatial tile
+)
+```
+
+### 3.4. AutoScheduler-style Design Space Generation
+
+AutoScheduler (Ansor) generates schedule templates by applying their SearchRules to each stage.
+SearchRule analyzes TE and eagerly trigger schedule primitives accordingly in its internally
+maintained mini IR.
+
+As introduced in Section 3.2, composite schedule rules are equivalent to AutoScheduler's SearchRule
+in TensorIR scheduling. To further generate a design space for scheduling, sampling instructions are
+used in composite schedule rules. Similarly, the sketch generation phase in AutoScheduler is
+equivalent to applying composite schedule rules to each block in TensorIR.
+
+Several built-in composite schedule rules are shipped with our system to align with Ansor's design
+space:
+
+* Multi-level tiling
+* Inline pure spatial blocks
+* Parallelize & vectorize & unroll
+* Auto tensorize
+
+Developers may implement their own rules in either Python or C++. They may specify which rules to
+use with the following syntax:
+
+```python
+from tvm import meta_schedule as ms
+
+design_space_generator = ms.PostOrderApply(rules=[
+    ms.MultiLevelTiling(...),
+    CustomRule(...),
+    ms.OtherBuiltinRules(...),
+])
+
+```
+
+### 3.5. Unifying manual schedule / AutoTVM / Ansor
+
+In this section, we show that the design space induced by TE manual schedule, AutoTVM and Ansor are
+all subsets of meta schedule, and meta schedule further allows mixing those three styles to search
+jointly.
+
+**Manual schedule**. The TE schedule is a special case of a meta schedule program, where there is no
+randomness introduced by sampling instructions. It is a single point in terms of design space.
+
+**AutoTVM (Template-based tuning)**. It is more natural representation of AutoTVM’s schedule
+templates (knobs) by writing one or more schedule functions in meta schedule with sampling
+instructions. The execution traces generated by the schedule functions are the design space to be

Review comment:
       what's an execution trace? hasn't been introduced anywhere else here

##########
File path: rfcs/0001-meta-schedule-autotensorir.md
##########
@@ -0,0 +1,444 @@
+<!--- Licensed to the Apache Software Foundation (ASF) under one -->
+<!--- or more contributor license agreements.  See the NOTICE file -->
+<!--- distributed with this work for additional information -->
+<!--- regarding copyright ownership.  The ASF licenses this file -->
+<!--- to you under the Apache License, Version 2.0 (the -->
+<!--- "License"); you may not use this file except in compliance -->
+<!--- with the License.  You may obtain a copy of the License at -->
+
+<!---   http://www.apache.org/licenses/LICENSE-2.0 -->
+
+<!--- Unless required by applicable law or agreed to in writing, -->
+<!--- software distributed under the License is distributed on an -->
+<!--- "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY -->
+<!--- KIND, either express or implied.  See the License for the -->
+<!--- specific language governing permissions and limitations -->
+<!--- under the License. -->
+
+* Feature Name: Meta Schedule (Formerly AutoTIR)
+* Start Date: 2021-05-28
+* RFC PR: TBD (apache/tvm-rfcs#0000)
+* GitHub Issue: TBD (apache/tvm-rfcs#0000)
+
+## 1. Summary
+
+This proposal introduces Meta Schedule: a probabilistic scheduling DSL on TIR that unifies the
+approaches of AutoTVM and Auto Scheduler (Ansor). Meta schedule provides a pragmatic way to define
+the space of automatic tuning, extensibility in terms of all possible TIR schedule primitives like
+tensorization and loop partitioning, and customizability on every layer of the automation system.
+
+Meta Schedule is our 3rd generation automatic scheduling system.
+
+## 2. Motivation
+
+**Scheduling and design space.** In TVM TensorIR, optimization of a TensorIR program is done via a
+sequence of transformations. For example, we reorder loops for better locality and we tensorize for
+specific hardware intrinsics. The process of invoking such a set of pre-defined transformations is
+called "**scheduling**", and each transformation is called a "**schedule primitive**". These
+primitives form a domain-specific language (DSL) describing the transformation of TensorIR programs.
+**Design space** is the set of all possible schedulings with respect to a TensorIR program.
+
+**Problems with the current scheduling system.** Currently we have 3 sets of scheduling APIs:
+* **Manual schedule**: Developers optimize their programs by manually invoking schedule primitives,
+  i.e. explore points in the design space with humans in the loop. This can be a tedious and
+  error-prone approach, hence the creation of AutoTVM and AutoScheduler (Ansor).
+* **AutoTVM**: The automation system requires users to define "schedule templates" as the design
+  space for each operator. Therefore, it is inextensible to hundreds of operators and variety
+  hardware platforms.
+* **AutoScheduler (Ansor)**: It automatically generates schedule templates as the design space,
+  according to a set of predefined "search rules". However, it is non-trivial to extend
+  AutoScheduler to new schedule primitives (tensorize, loop partition, software pipelining, etc).
+* The three systems above have isolated sets of APIs with several layers of their own abstraction,
+  which are not only hard to learn, but also engineering-intensive to customize.
+
+**Benefits of Meta Schedule.**  Meta schedule provides:
+* Succinct syntax, consistent APIs to TensorIR schedule with no other layer of abstraction.
+* Unified APIs for implementing manual schedules, AutoTVM-style schedules, and AutoScheduler-style
+  schedules.
+* Extensibility of all the schedule primitives, including tensorization and loop partitioning.
+  Almost no extra effort is needed to use a new primitive in auto-tuning.
+* The automation infrastructure is extensible on every of its components. Every component of the
+  system can be customized easily in pure python or C++ or both. For example, one can develop a new
+  design space generator in python, a new ProgramRunner in python, etc.
+
+
+## 3. Guide-level explanation
+
+In this section, we describe the syntax of meta schedule DSL, and how it could be used to describe
+and auto-generate the design space.
+
+### 3.1. Manual Schedule
+
+Meta schedule APIs are almost the same as TE or TensorIR scheduling. Here is an example of a manual
+schedule for matrix multiplication:
+
+```python
+# Designate a set of tile sizes
+i_tiles = [16, 8, 8, 8]
+j_tiles = [16, 8, 8, 8]
+k_tiles = [256, 8]
+
+# Tile the loops according to the tile sizes
+i_0, i_1, i_2, i_3 = sch.split(loop=i, factors=i_tiles)
+j_0, j_1, j_2, j_3 = sch.split(loop=j, factors=j_tiles)
+k_0, k_1           = sch.split(loop=k, factors=k_tiles)
+
+# Organize the loops into "SSRSRS" 6-level tiles
+sch.reorder(
+    i_0, j_0, # S: the 1st spatial tile
+    i_1, j_1, # S: the 2nd spatial tile
+    k_0,      # R: the 1st reduction tile
+    i_2, j_2, # S: the 3rd spatial tile
+    k_1,      # R: the 2nd reduction tile
+    i_3, j_3, # S: the 4th spatial tile
+)
+```
+
+In this example, the developers may tweak the tile sizes and measure the performance of the
+generated kernels to explore the opportunities of potential optimization.
+
+Generally speaking, while writing a schedule, there are often some parameters that are hard to
+determine ahead of time, for example, tile sizes, unroll steps, or which tensor intrinsics to use.
+Developers may manually enumerate possible combinations of these unknown factors, and then pick the
+best schedule according to measurement results on their device.
+
+### 3.2. Composite Schedule Rules
+
+As introduced in the previous section, in TensorIR, each schedule primitive handles only a very
+basic transformation of the IR. For example, `split` only splits a loop into two new loops. In the
+real world, the over-fine granularity of those primitives usually leads to repetitive and verbose
+scheduling code, as
+[mentioned](https://discuss.tvm.apache.org/t/rfc-tensorir-a-schedulable-ir-for-tvm/7872/43?u=junrushao1994)
+by developers in our community.
+
+To make it more convenient and modular, we allow users to register "composite schedules" that apply
+a sequence of schedule primitives according to certain analysis of the IR. For instance, a composite
+schedule may inspect a TensorIR block and decide whether we should call `compute_inline` on it.
+
+### 3.3. AutoTVM-style Design Space Description
+
+Meta schedule extends the schedule DSL with sampling instructions. When included in a schedule,
+these instructions parametrize the schedule from a single deterministic point to a space supported
+by random variables (tile size, etc.), making it possible for developers to describe the design
+space with meta schedule APIs.
+
+We can extend the matmul example above to cover all possible tilings using these sampling
+instructions:
+
+```python
+# Sample tile sizes
+i_tiles = sch.sample_perfect_tile(i, n=4)
+j_tiles = sch.sample_perfect_tile(j, n=4)
+k_tiles = sch.sample_perfect_tile(k, n=2)
+# Tile the loops according to the random variables
+i_0, i_1, i_2, i_3 = sch.split(loop=i, factors=i_tiles)
+j_0, j_1, j_2, j_3 = sch.split(loop=j, factors=j_tiles)
+k_0, k_1           = sch.split(loop=k, factors=k_tiles)
+# Organize the loops into "SSRSRS" 6-level tiles
+sch.reorder(
+    i_0, j_0, # S: the 1st spatial tile
+    i_1, j_1, # S: the 2nd spatial tile
+    k_0,      # R: the 1st reduction tile
+    i_2, j_2, # S: the 3rd spatial tile
+    k_1,      # R: the 2nd reduction tile
+    i_3, j_3, # S: the 4th spatial tile
+)
+```
+
+### 3.4. AutoScheduler-style Design Space Generation
+
+AutoScheduler (Ansor) generates schedule templates by applying their SearchRules to each stage.
+SearchRule analyzes TE and eagerly trigger schedule primitives accordingly in its internally
+maintained mini IR.
+
+As introduced in Section 3.2, composite schedule rules are equivalent to AutoScheduler's SearchRule
+in TensorIR scheduling. To further generate a design space for scheduling, sampling instructions are
+used in composite schedule rules. Similarly, the sketch generation phase in AutoScheduler is
+equivalent to applying composite schedule rules to each block in TensorIR.
+
+Several built-in composite schedule rules are shipped with our system to align with Ansor's design

Review comment:
       do you mean SearchRule or composite schedule rule?

##########
File path: rfcs/0001-meta-schedule-autotensorir.md
##########
@@ -0,0 +1,444 @@
+<!--- Licensed to the Apache Software Foundation (ASF) under one -->
+<!--- or more contributor license agreements.  See the NOTICE file -->
+<!--- distributed with this work for additional information -->
+<!--- regarding copyright ownership.  The ASF licenses this file -->
+<!--- to you under the Apache License, Version 2.0 (the -->
+<!--- "License"); you may not use this file except in compliance -->
+<!--- with the License.  You may obtain a copy of the License at -->
+
+<!---   http://www.apache.org/licenses/LICENSE-2.0 -->
+
+<!--- Unless required by applicable law or agreed to in writing, -->
+<!--- software distributed under the License is distributed on an -->
+<!--- "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY -->
+<!--- KIND, either express or implied.  See the License for the -->
+<!--- specific language governing permissions and limitations -->
+<!--- under the License. -->
+
+* Feature Name: Meta Schedule (Formerly AutoTIR)
+* Start Date: 2021-05-28
+* RFC PR: TBD (apache/tvm-rfcs#0000)
+* GitHub Issue: TBD (apache/tvm-rfcs#0000)
+
+## 1. Summary
+
+This proposal introduces Meta Schedule: a probabilistic scheduling DSL on TIR that unifies the
+approaches of AutoTVM and Auto Scheduler (Ansor). Meta schedule provides a pragmatic way to define
+the space of automatic tuning, extensibility in terms of all possible TIR schedule primitives like
+tensorization and loop partitioning, and customizability on every layer of the automation system.
+
+Meta Schedule is our 3rd generation automatic scheduling system.
+
+## 2. Motivation
+
+**Scheduling and design space.** In TVM TensorIR, optimization of a TensorIR program is done via a
+sequence of transformations. For example, we reorder loops for better locality and we tensorize for
+specific hardware intrinsics. The process of invoking such a set of pre-defined transformations is
+called "**scheduling**", and each transformation is called a "**schedule primitive**". These
+primitives form a domain-specific language (DSL) describing the transformation of TensorIR programs.
+**Design space** is the set of all possible schedulings with respect to a TensorIR program.
+
+**Problems with the current scheduling system.** Currently we have 3 sets of scheduling APIs:
+* **Manual schedule**: Developers optimize their programs by manually invoking schedule primitives,
+  i.e. explore points in the design space with humans in the loop. This can be a tedious and
+  error-prone approach, hence the creation of AutoTVM and AutoScheduler (Ansor).
+* **AutoTVM**: The automation system requires users to define "schedule templates" as the design
+  space for each operator. Therefore, it is inextensible to hundreds of operators and variety
+  hardware platforms.
+* **AutoScheduler (Ansor)**: It automatically generates schedule templates as the design space,
+  according to a set of predefined "search rules". However, it is non-trivial to extend
+  AutoScheduler to new schedule primitives (tensorize, loop partition, software pipelining, etc).
+* The three systems above have isolated sets of APIs with several layers of their own abstraction,
+  which are not only hard to learn, but also engineering-intensive to customize.
+
+**Benefits of Meta Schedule.**  Meta schedule provides:
+* Succinct syntax, consistent APIs to TensorIR schedule with no other layer of abstraction.
+* Unified APIs for implementing manual schedules, AutoTVM-style schedules, and AutoScheduler-style
+  schedules.
+* Extensibility of all the schedule primitives, including tensorization and loop partitioning.
+  Almost no extra effort is needed to use a new primitive in auto-tuning.
+* The automation infrastructure is extensible on every of its components. Every component of the
+  system can be customized easily in pure python or C++ or both. For example, one can develop a new
+  design space generator in python, a new ProgramRunner in python, etc.
+
+
+## 3. Guide-level explanation
+
+In this section, we describe the syntax of meta schedule DSL, and how it could be used to describe
+and auto-generate the design space.
+
+### 3.1. Manual Schedule
+
+Meta schedule APIs are almost the same as TE or TensorIR scheduling. Here is an example of a manual
+schedule for matrix multiplication:
+
+```python
+# Designate a set of tile sizes
+i_tiles = [16, 8, 8, 8]
+j_tiles = [16, 8, 8, 8]
+k_tiles = [256, 8]
+
+# Tile the loops according to the tile sizes
+i_0, i_1, i_2, i_3 = sch.split(loop=i, factors=i_tiles)
+j_0, j_1, j_2, j_3 = sch.split(loop=j, factors=j_tiles)
+k_0, k_1           = sch.split(loop=k, factors=k_tiles)
+
+# Organize the loops into "SSRSRS" 6-level tiles
+sch.reorder(
+    i_0, j_0, # S: the 1st spatial tile
+    i_1, j_1, # S: the 2nd spatial tile
+    k_0,      # R: the 1st reduction tile
+    i_2, j_2, # S: the 3rd spatial tile
+    k_1,      # R: the 2nd reduction tile
+    i_3, j_3, # S: the 4th spatial tile
+)
+```
+
+In this example, the developers may tweak the tile sizes and measure the performance of the
+generated kernels to explore the opportunities of potential optimization.
+
+Generally speaking, while writing a schedule, there are often some parameters that are hard to
+determine ahead of time, for example, tile sizes, unroll steps, or which tensor intrinsics to use.
+Developers may manually enumerate possible combinations of these unknown factors, and then pick the
+best schedule according to measurement results on their device.
+
+### 3.2. Composite Schedule Rules
+
+As introduced in the previous section, in TensorIR, each schedule primitive handles only a very
+basic transformation of the IR. For example, `split` only splits a loop into two new loops. In the
+real world, the over-fine granularity of those primitives usually leads to repetitive and verbose
+scheduling code, as
+[mentioned](https://discuss.tvm.apache.org/t/rfc-tensorir-a-schedulable-ir-for-tvm/7872/43?u=junrushao1994)
+by developers in our community.
+
+To make it more convenient and modular, we allow users to register "composite schedules" that apply
+a sequence of schedule primitives according to certain analysis of the IR. For instance, a composite
+schedule may inspect a TensorIR block and decide whether we should call `compute_inline` on it.
+
+### 3.3. AutoTVM-style Design Space Description
+
+Meta schedule extends the schedule DSL with sampling instructions. When included in a schedule,
+these instructions parametrize the schedule from a single deterministic point to a space supported
+by random variables (tile size, etc.), making it possible for developers to describe the design
+space with meta schedule APIs.
+
+We can extend the matmul example above to cover all possible tilings using these sampling
+instructions:
+
+```python
+# Sample tile sizes
+i_tiles = sch.sample_perfect_tile(i, n=4)
+j_tiles = sch.sample_perfect_tile(j, n=4)
+k_tiles = sch.sample_perfect_tile(k, n=2)
+# Tile the loops according to the random variables
+i_0, i_1, i_2, i_3 = sch.split(loop=i, factors=i_tiles)
+j_0, j_1, j_2, j_3 = sch.split(loop=j, factors=j_tiles)
+k_0, k_1           = sch.split(loop=k, factors=k_tiles)
+# Organize the loops into "SSRSRS" 6-level tiles
+sch.reorder(
+    i_0, j_0, # S: the 1st spatial tile
+    i_1, j_1, # S: the 2nd spatial tile
+    k_0,      # R: the 1st reduction tile
+    i_2, j_2, # S: the 3rd spatial tile
+    k_1,      # R: the 2nd reduction tile
+    i_3, j_3, # S: the 4th spatial tile
+)
+```
+
+### 3.4. AutoScheduler-style Design Space Generation
+
+AutoScheduler (Ansor) generates schedule templates by applying their SearchRules to each stage.
+SearchRule analyzes TE and eagerly trigger schedule primitives accordingly in its internally
+maintained mini IR.
+
+As introduced in Section 3.2, composite schedule rules are equivalent to AutoScheduler's SearchRule
+in TensorIR scheduling. To further generate a design space for scheduling, sampling instructions are
+used in composite schedule rules. Similarly, the sketch generation phase in AutoScheduler is
+equivalent to applying composite schedule rules to each block in TensorIR.
+
+Several built-in composite schedule rules are shipped with our system to align with Ansor's design
+space:
+
+* Multi-level tiling
+* Inline pure spatial blocks
+* Parallelize & vectorize & unroll
+* Auto tensorize
+
+Developers may implement their own rules in either Python or C++. They may specify which rules to
+use with the following syntax:
+
+```python
+from tvm import meta_schedule as ms
+
+design_space_generator = ms.PostOrderApply(rules=[
+    ms.MultiLevelTiling(...),
+    CustomRule(...),
+    ms.OtherBuiltinRules(...),
+])
+
+```
+
+### 3.5. Unifying manual schedule / AutoTVM / Ansor
+
+In this section, we show that the design space induced by TE manual schedule, AutoTVM and Ansor are
+all subsets of meta schedule, and meta schedule further allows mixing those three styles to search
+jointly.
+
+**Manual schedule**. The TE schedule is a special case of a meta schedule program, where there is no
+randomness introduced by sampling instructions. It is a single point in terms of design space.
+
+**AutoTVM (Template-based tuning)**. It is more natural representation of AutoTVM’s schedule
+templates (knobs) by writing one or more schedule functions in meta schedule with sampling
+instructions. The execution traces generated by the schedule functions are the design space to be
+explored.
+
+**AutoScheduler (Ansor, Template-free tuning)**. As mentioned in the previous section, application
+of composite schedule rules generates the design space, which is equivalent to Ansor’s sketch
+generation.
+
+**Mixing styles in design space definition**. By taking union of the spaces induced by the three
+special cases, our system allows developers to combine generic rules that Ansor provides and
+operator-specific scheduling.
+
+## 4. Reference-level explanation
+
+In this section, we introduce the underlying techniques for the automation system to extract and
+explore the design space. The figure below briefly illustrates the workflow of the system:
+
+![meta-schedule-workflow](../resources/meta-schedule-workflow.png)
+
+### 4.1. Execution trace as the design space
+
+**Trace**. To represent the design space defined by the meta schedule DSL, the underlying system
+records all the instructions users applied to the schedule class, including sampling and schedule
+primitives. We call this list of instructions a trace.
+
+For instance, executing the example above results in the following trace:
+
+```
+Instruction 0. Sample-Perfect-Tile
+Instruction 1. Sample-Perfect-Tile
+Instruction 2. Sample-Perfect-Tile
+Instruction 3. Split
+Instruction 4. Split
+Instruction 5. Split
+Instruction 6. Reorder
+```
+
+**Trace forms design space.** A trace may contain zero or more sampling instructions, which
+introduces the uncertainty in scheduling - one instance of sampling results in one point in the
+design space. Therefore, the trace itself forms a design space to explore, e.g. which set of tile
+sizes works best on a specific hardware.
+
+**Union of design space**. Our system works on a set of traces, representing the union of the design
+spaces represented by every single trace.
+
+**Fork a trace**. When two different decisions in the scheduling process are equally important to
+generate high-performance schedules, we allow forking the trace into two, and the design space is
+the union of the forked traces.
+
+The trace is not strictly user-facing, but can be accessed and printed with the following syntax:
+
+```python
+# requires to trace the execution
+sch = tir.Schedule(..., traced=True)
+# do a lot of scheduling
+...
+# print the trace
+print(sch.trace)
+```
+
+And below is an example of the printed trace, which honestly reflects the schedule as a snippet of
+python scheduling function:
+
+```python
+b0 = sch.get_block(name="matmul", func_name="main")
+l1, l2, l3 = sch.get_loops(block=b0)
+v4, v5, v6, v7 = sch.sample_perfect_tile(loop=l1, n=4, max_innermost_factor=16, decision=[32, 1, 16, 2])
+v8, v9, v10, v11 = sch.sample_perfect_tile(loop=l2, n=4, max_innermost_factor=16, decision=[64, 4, 2, 2])
+v12, v13 = sch.sample_perfect_tile(loop=l3, n=2, max_innermost_factor=16, decision=[64, 16])
+l14, l15, l16, l17 = sch.split(loop=l1, factors=[v4, v5, v6, v7])
+l18, l19, l20, l21 = sch.split(loop=l2, factors=[v8, v9, v10, v11])
+l22, l23 = sch.split(loop=l3, factors=[v12, v13])
+sch.reorder(l14, l18, l15, l19, l22, l16, l20, l23, l17, l21)
+```
+
+### 4.2. Exploring the Design Space
+
+Meta Schedule provides several built-in exploration strategies to exhaustively or efficiently search
+for efficient schedules.
+
+**Random search by replaying schedule functions.** With a user-provided schedule function
+as a black-box design space generator, our system could repetitively invoke such an opaque function
+without doing any extra analysis. The function could be written in C++ or Python, or any language
+that implements packed function FFI. If sampling instructions are present in the function, then each
+invocation results in a different IRModule after being scheduled because the random decisions are
+possibly changed across different runs. Effectively, it is equivalent to
+random exploration without trace, allowing the flexibility for users to define arbitrary functions
+that trace may not well support (e.g. control flow divergence based on the value of intermediate
+random variables), but it forbids future opportunity of any trace-based analysis.
+
+**Random search by replaying traces.** Traces are obtained from a design space generator, and
+replayed with a builtin interpreter in our system. If sampling instructions are present on the

Review comment:
       could you define "replayed"? i think you mean here that the scheduling process is replayed using the trace many times.

##########
File path: rfcs/0001-meta-schedule-autotensorir.md
##########
@@ -0,0 +1,444 @@
+<!--- Licensed to the Apache Software Foundation (ASF) under one -->
+<!--- or more contributor license agreements.  See the NOTICE file -->
+<!--- distributed with this work for additional information -->
+<!--- regarding copyright ownership.  The ASF licenses this file -->
+<!--- to you under the Apache License, Version 2.0 (the -->
+<!--- "License"); you may not use this file except in compliance -->
+<!--- with the License.  You may obtain a copy of the License at -->
+
+<!---   http://www.apache.org/licenses/LICENSE-2.0 -->
+
+<!--- Unless required by applicable law or agreed to in writing, -->
+<!--- software distributed under the License is distributed on an -->
+<!--- "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY -->
+<!--- KIND, either express or implied.  See the License for the -->
+<!--- specific language governing permissions and limitations -->
+<!--- under the License. -->
+
+* Feature Name: Meta Schedule (Formerly AutoTIR)
+* Start Date: 2021-05-28
+* RFC PR: TBD (apache/tvm-rfcs#0000)
+* GitHub Issue: TBD (apache/tvm-rfcs#0000)
+
+## 1. Summary
+
+This proposal introduces Meta Schedule: a probabilistic scheduling DSL on TIR that unifies the
+approaches of AutoTVM and Auto Scheduler (Ansor). Meta schedule provides a pragmatic way to define
+the space of automatic tuning, extensibility in terms of all possible TIR schedule primitives like
+tensorization and loop partitioning, and customizability on every layer of the automation system.
+
+Meta Schedule is our 3rd generation automatic scheduling system.
+
+## 2. Motivation
+
+**Scheduling and design space.** In TVM TensorIR, optimization of a TensorIR program is done via a
+sequence of transformations. For example, we reorder loops for better locality and we tensorize for
+specific hardware intrinsics. The process of invoking such a set of pre-defined transformations is
+called "**scheduling**", and each transformation is called a "**schedule primitive**". These
+primitives form a domain-specific language (DSL) describing the transformation of TensorIR programs.
+**Design space** is the set of all possible schedulings with respect to a TensorIR program.
+
+**Problems with the current scheduling system.** Currently we have 3 sets of scheduling APIs:
+* **Manual schedule**: Developers optimize their programs by manually invoking schedule primitives,
+  i.e. explore points in the design space with humans in the loop. This can be a tedious and
+  error-prone approach, hence the creation of AutoTVM and AutoScheduler (Ansor).
+* **AutoTVM**: The automation system requires users to define "schedule templates" as the design
+  space for each operator. Therefore, it is inextensible to hundreds of operators and variety
+  hardware platforms.
+* **AutoScheduler (Ansor)**: It automatically generates schedule templates as the design space,
+  according to a set of predefined "search rules". However, it is non-trivial to extend
+  AutoScheduler to new schedule primitives (tensorize, loop partition, software pipelining, etc).
+* The three systems above have isolated sets of APIs with several layers of their own abstraction,
+  which are not only hard to learn, but also engineering-intensive to customize.
+
+**Benefits of Meta Schedule.**  Meta schedule provides:
+* Succinct syntax, consistent APIs to TensorIR schedule with no other layer of abstraction.
+* Unified APIs for implementing manual schedules, AutoTVM-style schedules, and AutoScheduler-style
+  schedules.
+* Extensibility of all the schedule primitives, including tensorization and loop partitioning.
+  Almost no extra effort is needed to use a new primitive in auto-tuning.
+* The automation infrastructure is extensible on every of its components. Every component of the
+  system can be customized easily in pure python or C++ or both. For example, one can develop a new
+  design space generator in python, a new ProgramRunner in python, etc.
+
+
+## 3. Guide-level explanation
+
+In this section, we describe the syntax of meta schedule DSL, and how it could be used to describe
+and auto-generate the design space.
+
+### 3.1. Manual Schedule
+
+Meta schedule APIs are almost the same as TE or TensorIR scheduling. Here is an example of a manual
+schedule for matrix multiplication:
+
+```python
+# Designate a set of tile sizes
+i_tiles = [16, 8, 8, 8]
+j_tiles = [16, 8, 8, 8]
+k_tiles = [256, 8]
+
+# Tile the loops according to the tile sizes
+i_0, i_1, i_2, i_3 = sch.split(loop=i, factors=i_tiles)
+j_0, j_1, j_2, j_3 = sch.split(loop=j, factors=j_tiles)
+k_0, k_1           = sch.split(loop=k, factors=k_tiles)
+
+# Organize the loops into "SSRSRS" 6-level tiles
+sch.reorder(
+    i_0, j_0, # S: the 1st spatial tile
+    i_1, j_1, # S: the 2nd spatial tile
+    k_0,      # R: the 1st reduction tile
+    i_2, j_2, # S: the 3rd spatial tile
+    k_1,      # R: the 2nd reduction tile
+    i_3, j_3, # S: the 4th spatial tile
+)
+```
+
+In this example, the developers may tweak the tile sizes and measure the performance of the
+generated kernels to explore the opportunities of potential optimization.
+
+Generally speaking, while writing a schedule, there are often some parameters that are hard to
+determine ahead of time, for example, tile sizes, unroll steps, or which tensor intrinsics to use.
+Developers may manually enumerate possible combinations of these unknown factors, and then pick the
+best schedule according to measurement results on their device.
+
+### 3.2. Composite Schedule Rules
+
+As introduced in the previous section, in TensorIR, each schedule primitive handles only a very
+basic transformation of the IR. For example, `split` only splits a loop into two new loops. In the
+real world, the over-fine granularity of those primitives usually leads to repetitive and verbose
+scheduling code, as
+[mentioned](https://discuss.tvm.apache.org/t/rfc-tensorir-a-schedulable-ir-for-tvm/7872/43?u=junrushao1994)
+by developers in our community.
+
+To make it more convenient and modular, we allow users to register "composite schedules" that apply
+a sequence of schedule primitives according to certain analysis of the IR. For instance, a composite
+schedule may inspect a TensorIR block and decide whether we should call `compute_inline` on it.
+
+### 3.3. AutoTVM-style Design Space Description
+
+Meta schedule extends the schedule DSL with sampling instructions. When included in a schedule,
+these instructions parametrize the schedule from a single deterministic point to a space supported
+by random variables (tile size, etc.), making it possible for developers to describe the design
+space with meta schedule APIs.
+
+We can extend the matmul example above to cover all possible tilings using these sampling
+instructions:
+
+```python
+# Sample tile sizes
+i_tiles = sch.sample_perfect_tile(i, n=4)
+j_tiles = sch.sample_perfect_tile(j, n=4)
+k_tiles = sch.sample_perfect_tile(k, n=2)
+# Tile the loops according to the random variables
+i_0, i_1, i_2, i_3 = sch.split(loop=i, factors=i_tiles)
+j_0, j_1, j_2, j_3 = sch.split(loop=j, factors=j_tiles)
+k_0, k_1           = sch.split(loop=k, factors=k_tiles)
+# Organize the loops into "SSRSRS" 6-level tiles
+sch.reorder(
+    i_0, j_0, # S: the 1st spatial tile
+    i_1, j_1, # S: the 2nd spatial tile
+    k_0,      # R: the 1st reduction tile
+    i_2, j_2, # S: the 3rd spatial tile
+    k_1,      # R: the 2nd reduction tile
+    i_3, j_3, # S: the 4th spatial tile
+)
+```
+
+### 3.4. AutoScheduler-style Design Space Generation
+
+AutoScheduler (Ansor) generates schedule templates by applying their SearchRules to each stage.
+SearchRule analyzes TE and eagerly trigger schedule primitives accordingly in its internally
+maintained mini IR.
+
+As introduced in Section 3.2, composite schedule rules are equivalent to AutoScheduler's SearchRule
+in TensorIR scheduling. To further generate a design space for scheduling, sampling instructions are
+used in composite schedule rules. Similarly, the sketch generation phase in AutoScheduler is
+equivalent to applying composite schedule rules to each block in TensorIR.
+
+Several built-in composite schedule rules are shipped with our system to align with Ansor's design
+space:
+
+* Multi-level tiling
+* Inline pure spatial blocks
+* Parallelize & vectorize & unroll
+* Auto tensorize
+
+Developers may implement their own rules in either Python or C++. They may specify which rules to
+use with the following syntax:
+
+```python
+from tvm import meta_schedule as ms
+
+design_space_generator = ms.PostOrderApply(rules=[
+    ms.MultiLevelTiling(...),
+    CustomRule(...),
+    ms.OtherBuiltinRules(...),
+])
+
+```
+
+### 3.5. Unifying manual schedule / AutoTVM / Ansor
+
+In this section, we show that the design space induced by TE manual schedule, AutoTVM and Ansor are
+all subsets of meta schedule, and meta schedule further allows mixing those three styles to search
+jointly.
+
+**Manual schedule**. The TE schedule is a special case of a meta schedule program, where there is no
+randomness introduced by sampling instructions. It is a single point in terms of design space.
+
+**AutoTVM (Template-based tuning)**. It is more natural representation of AutoTVM’s schedule
+templates (knobs) by writing one or more schedule functions in meta schedule with sampling
+instructions. The execution traces generated by the schedule functions are the design space to be
+explored.
+
+**AutoScheduler (Ansor, Template-free tuning)**. As mentioned in the previous section, application
+of composite schedule rules generates the design space, which is equivalent to Ansor’s sketch
+generation.
+
+**Mixing styles in design space definition**. By taking union of the spaces induced by the three
+special cases, our system allows developers to combine generic rules that Ansor provides and
+operator-specific scheduling.
+
+## 4. Reference-level explanation
+
+In this section, we introduce the underlying techniques for the automation system to extract and
+explore the design space. The figure below briefly illustrates the workflow of the system:
+
+![meta-schedule-workflow](../resources/meta-schedule-workflow.png)
+
+### 4.1. Execution trace as the design space
+
+**Trace**. To represent the design space defined by the meta schedule DSL, the underlying system
+records all the instructions users applied to the schedule class, including sampling and schedule
+primitives. We call this list of instructions a trace.
+
+For instance, executing the example above results in the following trace:
+
+```
+Instruction 0. Sample-Perfect-Tile
+Instruction 1. Sample-Perfect-Tile
+Instruction 2. Sample-Perfect-Tile
+Instruction 3. Split
+Instruction 4. Split
+Instruction 5. Split
+Instruction 6. Reorder
+```
+
+**Trace forms design space.** A trace may contain zero or more sampling instructions, which
+introduces the uncertainty in scheduling - one instance of sampling results in one point in the
+design space. Therefore, the trace itself forms a design space to explore, e.g. which set of tile
+sizes works best on a specific hardware.
+
+**Union of design space**. Our system works on a set of traces, representing the union of the design
+spaces represented by every single trace.
+
+**Fork a trace**. When two different decisions in the scheduling process are equally important to
+generate high-performance schedules, we allow forking the trace into two, and the design space is
+the union of the forked traces.
+
+The trace is not strictly user-facing, but can be accessed and printed with the following syntax:
+
+```python
+# requires to trace the execution
+sch = tir.Schedule(..., traced=True)
+# do a lot of scheduling
+...
+# print the trace
+print(sch.trace)
+```
+
+And below is an example of the printed trace, which honestly reflects the schedule as a snippet of
+python scheduling function:
+
+```python
+b0 = sch.get_block(name="matmul", func_name="main")
+l1, l2, l3 = sch.get_loops(block=b0)
+v4, v5, v6, v7 = sch.sample_perfect_tile(loop=l1, n=4, max_innermost_factor=16, decision=[32, 1, 16, 2])
+v8, v9, v10, v11 = sch.sample_perfect_tile(loop=l2, n=4, max_innermost_factor=16, decision=[64, 4, 2, 2])
+v12, v13 = sch.sample_perfect_tile(loop=l3, n=2, max_innermost_factor=16, decision=[64, 16])
+l14, l15, l16, l17 = sch.split(loop=l1, factors=[v4, v5, v6, v7])
+l18, l19, l20, l21 = sch.split(loop=l2, factors=[v8, v9, v10, v11])
+l22, l23 = sch.split(loop=l3, factors=[v12, v13])
+sch.reorder(l14, l18, l15, l19, l22, l16, l20, l23, l17, l21)
+```
+
+### 4.2. Exploring the Design Space
+
+Meta Schedule provides several built-in exploration strategies to exhaustively or efficiently search
+for efficient schedules.
+
+**Random search by replaying schedule functions.** With a user-provided schedule function
+as a black-box design space generator, our system could repetitively invoke such an opaque function
+without doing any extra analysis. The function could be written in C++ or Python, or any language
+that implements packed function FFI. If sampling instructions are present in the function, then each
+invocation results in a different IRModule after being scheduled because the random decisions are
+possibly changed across different runs. Effectively, it is equivalent to
+random exploration without trace, allowing the flexibility for users to define arbitrary functions
+that trace may not well support (e.g. control flow divergence based on the value of intermediate
+random variables), but it forbids future opportunity of any trace-based analysis.
+
+**Random search by replaying traces.** Traces are obtained from a design space generator, and
+replayed with a builtin interpreter in our system. If sampling instructions are present on the
+traces, then their random decisions are mutated during each replay, i.e. jumps to a new point in the
+design space. Therefore, repetitive replay of those traces are equivalent to exploration of the
+design space. Our system could potentially benefit from trace-based analysis, including rejecting
+obviously invalid schedules (e.g. using too much CUDA resources), doing dead-code elimination to
+simplify a trace, extracting trace-based features used in the cost model, etc.
+
+**Cost-model-guided evolutionary search**. A more efficient exploration strategy. We define two sets
+of rules:
+
+* Mutator: defines how to jump to a point’s "neighbor" in the design space
+* Postprocessor: after the trace is executed, there are some extra rules we want to execute, for

Review comment:
       "rules" sounds like SearchRule. could you say something like "there may be extra rule-specific logic to execute to either validate or tweak the produced schedule."

##########
File path: rfcs/0001-meta-schedule-autotensorir.md
##########
@@ -0,0 +1,444 @@
+<!--- Licensed to the Apache Software Foundation (ASF) under one -->
+<!--- or more contributor license agreements.  See the NOTICE file -->
+<!--- distributed with this work for additional information -->
+<!--- regarding copyright ownership.  The ASF licenses this file -->
+<!--- to you under the Apache License, Version 2.0 (the -->
+<!--- "License"); you may not use this file except in compliance -->
+<!--- with the License.  You may obtain a copy of the License at -->
+
+<!---   http://www.apache.org/licenses/LICENSE-2.0 -->
+
+<!--- Unless required by applicable law or agreed to in writing, -->
+<!--- software distributed under the License is distributed on an -->
+<!--- "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY -->
+<!--- KIND, either express or implied.  See the License for the -->
+<!--- specific language governing permissions and limitations -->
+<!--- under the License. -->
+
+* Feature Name: Meta Schedule (Formerly AutoTIR)
+* Start Date: 2021-05-28
+* RFC PR: TBD (apache/tvm-rfcs#0000)
+* GitHub Issue: TBD (apache/tvm-rfcs#0000)
+
+## 1. Summary
+
+This proposal introduces Meta Schedule: a probabilistic scheduling DSL on TIR that unifies the
+approaches of AutoTVM and Auto Scheduler (Ansor). Meta schedule provides a pragmatic way to define
+the space of automatic tuning, extensibility in terms of all possible TIR schedule primitives like
+tensorization and loop partitioning, and customizability on every layer of the automation system.
+
+Meta Schedule is our 3rd generation automatic scheduling system.
+
+## 2. Motivation
+
+**Scheduling and design space.** In TVM TensorIR, optimization of a TensorIR program is done via a
+sequence of transformations. For example, we reorder loops for better locality and we tensorize for
+specific hardware intrinsics. The process of invoking such a set of pre-defined transformations is
+called "**scheduling**", and each transformation is called a "**schedule primitive**". These
+primitives form a domain-specific language (DSL) describing the transformation of TensorIR programs.
+**Design space** is the set of all possible schedulings with respect to a TensorIR program.
+
+**Problems with the current scheduling system.** Currently we have 3 sets of scheduling APIs:
+* **Manual schedule**: Developers optimize their programs by manually invoking schedule primitives,
+  i.e. explore points in the design space with humans in the loop. This can be a tedious and
+  error-prone approach, hence the creation of AutoTVM and AutoScheduler (Ansor).
+* **AutoTVM**: The automation system requires users to define "schedule templates" as the design
+  space for each operator. Therefore, it is inextensible to hundreds of operators and variety
+  hardware platforms.
+* **AutoScheduler (Ansor)**: It automatically generates schedule templates as the design space,
+  according to a set of predefined "search rules". However, it is non-trivial to extend
+  AutoScheduler to new schedule primitives (tensorize, loop partition, software pipelining, etc).
+* The three systems above have isolated sets of APIs with several layers of their own abstraction,
+  which are not only hard to learn, but also engineering-intensive to customize.
+
+**Benefits of Meta Schedule.**  Meta schedule provides:
+* Succinct syntax, consistent APIs to TensorIR schedule with no other layer of abstraction.
+* Unified APIs for implementing manual schedules, AutoTVM-style schedules, and AutoScheduler-style
+  schedules.
+* Extensibility of all the schedule primitives, including tensorization and loop partitioning.
+  Almost no extra effort is needed to use a new primitive in auto-tuning.
+* The automation infrastructure is extensible on every of its components. Every component of the
+  system can be customized easily in pure python or C++ or both. For example, one can develop a new
+  design space generator in python, a new ProgramRunner in python, etc.
+
+
+## 3. Guide-level explanation
+
+In this section, we describe the syntax of meta schedule DSL, and how it could be used to describe
+and auto-generate the design space.
+
+### 3.1. Manual Schedule
+
+Meta schedule APIs are almost the same as TE or TensorIR scheduling. Here is an example of a manual
+schedule for matrix multiplication:
+
+```python
+# Designate a set of tile sizes
+i_tiles = [16, 8, 8, 8]
+j_tiles = [16, 8, 8, 8]
+k_tiles = [256, 8]
+
+# Tile the loops according to the tile sizes
+i_0, i_1, i_2, i_3 = sch.split(loop=i, factors=i_tiles)
+j_0, j_1, j_2, j_3 = sch.split(loop=j, factors=j_tiles)
+k_0, k_1           = sch.split(loop=k, factors=k_tiles)
+
+# Organize the loops into "SSRSRS" 6-level tiles
+sch.reorder(
+    i_0, j_0, # S: the 1st spatial tile
+    i_1, j_1, # S: the 2nd spatial tile
+    k_0,      # R: the 1st reduction tile
+    i_2, j_2, # S: the 3rd spatial tile
+    k_1,      # R: the 2nd reduction tile
+    i_3, j_3, # S: the 4th spatial tile
+)
+```
+
+In this example, the developers may tweak the tile sizes and measure the performance of the
+generated kernels to explore the opportunities of potential optimization.
+
+Generally speaking, while writing a schedule, there are often some parameters that are hard to
+determine ahead of time, for example, tile sizes, unroll steps, or which tensor intrinsics to use.
+Developers may manually enumerate possible combinations of these unknown factors, and then pick the
+best schedule according to measurement results on their device.
+
+### 3.2. Composite Schedule Rules
+
+As introduced in the previous section, in TensorIR, each schedule primitive handles only a very
+basic transformation of the IR. For example, `split` only splits a loop into two new loops. In the
+real world, the over-fine granularity of those primitives usually leads to repetitive and verbose
+scheduling code, as
+[mentioned](https://discuss.tvm.apache.org/t/rfc-tensorir-a-schedulable-ir-for-tvm/7872/43?u=junrushao1994)
+by developers in our community.
+
+To make it more convenient and modular, we allow users to register "composite schedules" that apply
+a sequence of schedule primitives according to certain analysis of the IR. For instance, a composite
+schedule may inspect a TensorIR block and decide whether we should call `compute_inline` on it.
+
+### 3.3. AutoTVM-style Design Space Description
+
+Meta schedule extends the schedule DSL with sampling instructions. When included in a schedule,
+these instructions parametrize the schedule from a single deterministic point to a space supported
+by random variables (tile size, etc.), making it possible for developers to describe the design
+space with meta schedule APIs.
+
+We can extend the matmul example above to cover all possible tilings using these sampling
+instructions:
+
+```python
+# Sample tile sizes
+i_tiles = sch.sample_perfect_tile(i, n=4)
+j_tiles = sch.sample_perfect_tile(j, n=4)
+k_tiles = sch.sample_perfect_tile(k, n=2)
+# Tile the loops according to the random variables
+i_0, i_1, i_2, i_3 = sch.split(loop=i, factors=i_tiles)
+j_0, j_1, j_2, j_3 = sch.split(loop=j, factors=j_tiles)
+k_0, k_1           = sch.split(loop=k, factors=k_tiles)
+# Organize the loops into "SSRSRS" 6-level tiles
+sch.reorder(
+    i_0, j_0, # S: the 1st spatial tile
+    i_1, j_1, # S: the 2nd spatial tile
+    k_0,      # R: the 1st reduction tile
+    i_2, j_2, # S: the 3rd spatial tile
+    k_1,      # R: the 2nd reduction tile
+    i_3, j_3, # S: the 4th spatial tile
+)
+```
+
+### 3.4. AutoScheduler-style Design Space Generation
+
+AutoScheduler (Ansor) generates schedule templates by applying their SearchRules to each stage.
+SearchRule analyzes TE and eagerly trigger schedule primitives accordingly in its internally
+maintained mini IR.
+
+As introduced in Section 3.2, composite schedule rules are equivalent to AutoScheduler's SearchRule
+in TensorIR scheduling. To further generate a design space for scheduling, sampling instructions are
+used in composite schedule rules. Similarly, the sketch generation phase in AutoScheduler is
+equivalent to applying composite schedule rules to each block in TensorIR.
+
+Several built-in composite schedule rules are shipped with our system to align with Ansor's design
+space:
+
+* Multi-level tiling
+* Inline pure spatial blocks
+* Parallelize & vectorize & unroll
+* Auto tensorize
+
+Developers may implement their own rules in either Python or C++. They may specify which rules to
+use with the following syntax:
+
+```python
+from tvm import meta_schedule as ms
+
+design_space_generator = ms.PostOrderApply(rules=[
+    ms.MultiLevelTiling(...),
+    CustomRule(...),
+    ms.OtherBuiltinRules(...),
+])
+
+```
+
+### 3.5. Unifying manual schedule / AutoTVM / Ansor
+
+In this section, we show that the design space induced by TE manual schedule, AutoTVM and Ansor are
+all subsets of meta schedule, and meta schedule further allows mixing those three styles to search
+jointly.
+
+**Manual schedule**. The TE schedule is a special case of a meta schedule program, where there is no
+randomness introduced by sampling instructions. It is a single point in terms of design space.
+
+**AutoTVM (Template-based tuning)**. It is more natural representation of AutoTVM’s schedule
+templates (knobs) by writing one or more schedule functions in meta schedule with sampling
+instructions. The execution traces generated by the schedule functions are the design space to be
+explored.
+
+**AutoScheduler (Ansor, Template-free tuning)**. As mentioned in the previous section, application
+of composite schedule rules generates the design space, which is equivalent to Ansor’s sketch
+generation.
+
+**Mixing styles in design space definition**. By taking union of the spaces induced by the three
+special cases, our system allows developers to combine generic rules that Ansor provides and
+operator-specific scheduling.
+
+## 4. Reference-level explanation
+
+In this section, we introduce the underlying techniques for the automation system to extract and
+explore the design space. The figure below briefly illustrates the workflow of the system:
+
+![meta-schedule-workflow](../resources/meta-schedule-workflow.png)
+
+### 4.1. Execution trace as the design space
+
+**Trace**. To represent the design space defined by the meta schedule DSL, the underlying system
+records all the instructions users applied to the schedule class, including sampling and schedule
+primitives. We call this list of instructions a trace.
+
+For instance, executing the example above results in the following trace:
+
+```
+Instruction 0. Sample-Perfect-Tile
+Instruction 1. Sample-Perfect-Tile
+Instruction 2. Sample-Perfect-Tile
+Instruction 3. Split
+Instruction 4. Split
+Instruction 5. Split
+Instruction 6. Reorder
+```
+
+**Trace forms design space.** A trace may contain zero or more sampling instructions, which
+introduces the uncertainty in scheduling - one instance of sampling results in one point in the
+design space. Therefore, the trace itself forms a design space to explore, e.g. which set of tile
+sizes works best on a specific hardware.
+
+**Union of design space**. Our system works on a set of traces, representing the union of the design
+spaces represented by every single trace.
+
+**Fork a trace**. When two different decisions in the scheduling process are equally important to
+generate high-performance schedules, we allow forking the trace into two, and the design space is
+the union of the forked traces.
+
+The trace is not strictly user-facing, but can be accessed and printed with the following syntax:
+
+```python
+# requires to trace the execution
+sch = tir.Schedule(..., traced=True)
+# do a lot of scheduling
+...
+# print the trace
+print(sch.trace)
+```
+
+And below is an example of the printed trace, which honestly reflects the schedule as a snippet of
+python scheduling function:
+
+```python
+b0 = sch.get_block(name="matmul", func_name="main")
+l1, l2, l3 = sch.get_loops(block=b0)
+v4, v5, v6, v7 = sch.sample_perfect_tile(loop=l1, n=4, max_innermost_factor=16, decision=[32, 1, 16, 2])
+v8, v9, v10, v11 = sch.sample_perfect_tile(loop=l2, n=4, max_innermost_factor=16, decision=[64, 4, 2, 2])
+v12, v13 = sch.sample_perfect_tile(loop=l3, n=2, max_innermost_factor=16, decision=[64, 16])
+l14, l15, l16, l17 = sch.split(loop=l1, factors=[v4, v5, v6, v7])
+l18, l19, l20, l21 = sch.split(loop=l2, factors=[v8, v9, v10, v11])
+l22, l23 = sch.split(loop=l3, factors=[v12, v13])
+sch.reorder(l14, l18, l15, l19, l22, l16, l20, l23, l17, l21)
+```
+
+### 4.2. Exploring the Design Space
+
+Meta Schedule provides several built-in exploration strategies to exhaustively or efficiently search
+for efficient schedules.
+
+**Random search by replaying schedule functions.** With a user-provided schedule function
+as a black-box design space generator, our system could repetitively invoke such an opaque function
+without doing any extra analysis. The function could be written in C++ or Python, or any language
+that implements packed function FFI. If sampling instructions are present in the function, then each
+invocation results in a different IRModule after being scheduled because the random decisions are
+possibly changed across different runs. Effectively, it is equivalent to
+random exploration without trace, allowing the flexibility for users to define arbitrary functions
+that trace may not well support (e.g. control flow divergence based on the value of intermediate
+random variables), but it forbids future opportunity of any trace-based analysis.
+
+**Random search by replaying traces.** Traces are obtained from a design space generator, and
+replayed with a builtin interpreter in our system. If sampling instructions are present on the
+traces, then their random decisions are mutated during each replay, i.e. jumps to a new point in the
+design space. Therefore, repetitive replay of those traces are equivalent to exploration of the
+design space. Our system could potentially benefit from trace-based analysis, including rejecting
+obviously invalid schedules (e.g. using too much CUDA resources), doing dead-code elimination to
+simplify a trace, extracting trace-based features used in the cost model, etc.
+
+**Cost-model-guided evolutionary search**. A more efficient exploration strategy. We define two sets

Review comment:
       could you summarize with something more explanatory than "more efficient"? the reader can probably reason about the efficiency themselves, or you could place that at the end. what's important to communicate here is: what is the characteristic that makes this different from the others? maybe "An exploration strategy that uses iterative rules" or something?
   
   also, consider removing "we" here and replace with the name of the person. We is the user, not you :).

##########
File path: rfcs/0001-meta-schedule-autotensorir.md
##########
@@ -0,0 +1,297 @@
+* Feature Name: Meta Schedule (AutoTensorIR)
+* Start Date: 2021-05-28
+* RFC PR: TBD (apache/tvm-rfcs#0000)
+* GitHub Issue: TBD (apache/tvm-rfcs#0000)
+
+## 1. Summary
+
+This proposal introduces Meta Schedule: a probabilistic scheduling DSL on TIR that unifies the approaches of AutoTVM and Auto Scheduler (Ansor). Meta schedule provides a pragmatic way to define the space of automatic tuning, extensibility in terms of all possible TIR schedule primitives like tensorization and loop partitioning, and customizability on every layer of the automation system.
+
+Meta Schedule is our 3rd generation automatic scheduling system.
+
+## 2. Motivation
+
+**Scheduling and Design Space**
+
+In TVM TensorIR, optimization of a TensorIR program is done via a sequence of transformations. For example, we reorder loops for better locality and we tensorize for specific hardware intrinsics. The process of invoking such a set of pre-defined transformations is called “**scheduling**”, and each transformation is called a “**schedule primitive**”. These primitives form a domain-specific language (DSL) describing the transformation of TensorIR programs. **Design space** is the set of all possible schedulings with respect to a TensorIR program.
+
+**Problems with the Current Scheduling System**
+
+* **Manual schedule**: Developers optimize their programs by manually invoking schedule primitives, i.e. explore points in the design space with humans in the loop. This can be a tedious and error-prone approach, hence the creation of AutoTVM and AutoScheduler (Ansor).
+* **AutoTVM**: The automation system requires users to define “schedule templates” as the design space for each operator. Therefore, it is inextensible to hundreds of operators.
+* **AutoScheduler (Ansor)**: It automatically generates schedule templates as the design space, according to a set of predefined “search rules”. However, it is non-trivial to extend AutoScheduler to new schedule primitives (tensorize, loop partition, software pipelining).
+* The three systems above have isolated sets of APIs with several layers of their own abstraction, which are not only hard to learn, but also engineering-intensive to customize.
+
+**Benefit of Meta Schedule**
+
+* Succinct syntax, consistent APIs to TensorIR schedule with no other layer of abstraction.
+* Provides unified APIs for implementing manual schedule, AutoTVM and AutoScheduler (Ansor).
+* Extensibility to all the schedule primitives, including tensorization and loop partitioning. Almost no extra effort is needed to use a new primitive in auto-tuning.
+* The automation infrastructure is customizable across every layer.
+
+## 3. Guide-level explanation
+
+In this section, we describe the syntax of meta schedule DSL, and how it could be used to describe and auto-generate the design space.
+
+### 3.1. Manual Schedule
+
+Meta schedule APIs are almost the same as TE or TensorIR scheduling. Here is an example of a manual schedule for matrix multiplication:
+
+```python
+# Designate a set of tile sizes
+i_tiles = [16, 8, 8, 8]
+j_tiles = [16, 8, 8, 8]
+k_tiles = [256, 8]
+
+# Tile the loops according to the tile sizes
+i_0, i_1, i_2, i_3 = sch.split(loop=i, factors=i_tiles)
+j_0, j_1, j_2, j_3 = sch.split(loop=j, factors=j_tiles)
+k_0, k_1           = sch.split(loop=k, factors=k_tiles)
+
+# Organize the loops into “SSRSRS” 6-level tiles
+sch.reorder(
+    i_0, j_0, # S
+    i_1, j_1, # S
+    k_0,      # R
+    i_2, j_2, # S
+    k_1,      # R
+    i_3, j_3, # S
+)
+```
+
+In this example, the developers may tweak the tile sizes and measure the performance of the generated kernels to explore the opportunities of potential optimization.
+
+Generally speaking, while writing a schedule, there are often some parameters that are hard to determine ahead of time, for example, tile sizes, unroll steps, or which tensor intrinsics to use. Developers may manually enumerate possible combinations of these unknown factors, and then pick the best schedule according to measurement results on their device.
+
+### 3.2. AutoTVM-style Design Space Description
+
+Meta schedule extends the schedule DSL with sampling instructions. When included in a schedule, these instructions parametrize the schedule from a single deterministic point to a space supported by random variables (tile size, etc.), making it possible for developers to describe the design space with meta schedule APIs.
+
+We can extend the matmul example above to cover all possible tilings using these sampling instructions:
+
+```python
+# Sample tile sizes
+i_tiles = sch.sample_perfect_tile(i, n=4)
+j_tiles = sch.sample_perfect_tile(j, n=4)
+k_tiles = sch.sample_perfect_tile(k, n=2)
+# Tile the loops according to the random variables
+i_0, i_1, i_2, i_3 = sch.split(loop=i, factors=i_tiles)
+j_0, j_1, j_2, j_3 = sch.split(loop=j, factors=j_tiles)
+k_0, k_1           = sch.split(loop=k, factors=k_tiles)
+# Organize the loops into “SSRSRS” 6-level tiles
+sch.reorder(
+    i_0, j_0, # S
+    i_1, j_1, # S
+    k_0,      # R
+    i_2, j_2, # S
+    k_1,      # R
+    i_3, j_3, # S
+)
+```
+
+### 3.3. Composite Schedule
+
+Each schedule primitive handles only a very basic operation to transform the IR, for example, `split` only splits a loop into two. In the real world, the over-fine granularity of those primitives usually leads to repetitive and verbose scheduling code, as [mentioned](https://discuss.tvm.apache.org/t/rfc-tensorir-a-schedulable-ir-for-tvm/7872/43?u=junrushao1994) by developers in our community.
+
+To counter this challenge, we allow users to register “composite schedules” that analyze the IR, and apply a set of schedule primitives correspondingly. For instance, a composite schedule may inspect a TensorIR block and decide whether we should call `compute_inline` on it. The composite schedule may use sampling instructions to fill in undecided choices.
+
+Our system also ships with some built-in composite schedules, including:
+
+* Multi-level tiling
+* Inline pure spatial blocks
+* Parallelize & vectorize & unroll
+* Auto tensorize
+* …
+
+### 3.4. AutoScheduler-style Design Space Generation
+
+AutoScheduler (Ansor) generates schedule templates by applying their SearchRules to each stage. Meta schedule treats a search rule as a composite schedule, and applies each composite schedule to each block of TensorIR to generate the design space.
+
+### 3.5. Unifying manual schedule / AutoTVM / Ansor
+
+In this section, we show that the design space induced by TE manual schedule, AutoTVM and Ansor are all subsets of meta schedule, and meta schedule further allows mixing those three styles to search jointly.
+
+**Manual schedule**. The TE schedule is a special case of a meta schedule program, where there is no randomness introduced by sampling instructions. It is a single point in terms of design space.
+
+**AutoTVM (Template-based tuning)**. Writing one or more schedule functions in meta schedule, potentially with sampling instructions, is a natural representation of AutoTVM’s schedule templates (knobs). The PPL generates one or more traces as the design space to explore.
+
+**AutoScheduler (Ansor, Template-free tuning)**. As mentioned in the previous section, application of composite schedule rules generates the design space, which is equivalent to Ansor’s sketch generation.
+
+**Mixing styles in design space definition**. By taking union of the spaces induced by the three special cases, our system allows developers to combine generic rules that Ansor provides and operator-specific scheduling.
+
+## 4. Reference-level explanation
+
+In this section, we introduce the underlying techniques for the automation system to extract and explore the design space.
+
+### 4.1. Execution trace as the design space
+
+**Trace**. To represent the design space defined by the meta schedule DSL, the underlying system records all the instructions users applied to the schedule class, including sampling and schedule primitives. We call this list of instructions a trace.
+
+Executing the example above results in the following trace:
+
+```
+Instruction 0. Sample-Perfect-Tile
+Instruction 1. Sample-Perfect-Tile
+Instruction 2. Sample-Perfect-Tile
+Instruction 3. Split
+Instruction 4. Split
+Instruction 5. Split
+Instruction 6. Reorder
+```
+
+The trace is not directly user-facing, but a data structure inside the user-facing `Schedule` class that records the execution. The automation system extracts the trace and finds out the design space according to the sampling instructions.
+
+**Union of traces**. Often a single trace is unable to represent the entire space. Therefore, more precisely, our system works on a list of traces as the union of potential design space.
+
+**Fork a trace**. When two different decisions in the scheduling process are equally important to generate high-performance schedules, we allow forking the trace into two, and the design space is the union of the forked traces.
+
+### 4.2. Exploring the Search Space
+
+Meta Schedule provides several built-in exploration strategies to exhaustively or efficiently search for efficient schedules.
+
+**Program replay**. A simple strategy that replays the schedule program that generates the PPL, and doesn’t use any advantage provided by the PPL.
+
+**Random search**. Extracts the PPL, and repetitively re-executes the PPL by flipping coins purely randomly.
+
+**Cost-model-guided evolutionary search**. A more efficient exploration strategy. We define two sets of rules:
+
+* Mutator: defines how to jump to a point’s “neighbor” in the design space
+* Postprocessor: sometimes it is non-trivial to statically determine the PPL, for example:
+  * There is a hard requirement in CUDA that the maximum number of threads should not exceed 1024, but it is a random variable that cannot be determined before actually executing the PPL. In this case, we write a postprocessor that errors out when the condition is not satisfied.
+  * The number of outer loops to be fused together depends on their extents, which are random variables. In this case, we annotate the maximum extent allowed on the block, and do actual fusion in a postprocessor.
+
+Our evolutionary search algorithm uses mutators to find possible schedules in the design space, then applies postprocessors, and asks the cost model to predict its performance. After several iterations, the new schedules with the highest scores are finally compiled and measured on device. Epsilon-greedy is used in this process to balance exploitation and exploration.
+
+### 4.3. Python first for flexibility & customizability
+
+We implement the system in a way that all levels are decoupled and open to customization, aiming at providing a playground for developers to try out new ideas and potentially deliver performance quickly.
+
+While all the important APIs are implemented in C++ for efficiency, every part of the system can be easily switched to customized python implementation. For example,
+
+**Customize design space in python**. Can be a python function that does the schedule
+
+```python
+def schedule_matmul(sch) -> sch:
+    i, j, k = sch.get_loops(sch.get_block(“matmul”))
+    i_tiles = sch.sample_perfect_tile(i, n=4)
+    j_tiles = sch.sample_perfect_tile(j, n=4)
+    k_tiles = sch.sample_perfect_tile(k, n=2)
+    # Tile the loops according to the random variables
+    i_0, i_1, i_2, i_3 = sch.split(loop=i, factors=i_tiles)
+    j_0, j_1, j_2, j_3 = sch.split(loop=j, factors=j_tiles)
+    k_0, k_1 = sch.split(loop=k, factors=k_tiles)
+    # Organize the loops into “SSRSRS” 6-level tiles
+    sch.reorder(
+        i_0, j_0, # S
+        i_1, j_1, # S
+        k_0,      # R
+        i_2, j_2, # S
+        k_1,      # R
+        i_3, j_3, # S
+    )
+    return sch
+```
+
+**Customize composite schedule in python**. We provide two ways to define a composite schedule in python:
+
+Method 1. A simple decorator that converts a python function to a composite schedule
+
+```python
+@tir.as_composite_schedule(name="multi-level-tiling")
+def multi_level_tiling(sch: Schedule, block: BlockRV) -> Union[Schedule, List[Schedule]]:
+    ...
+```
+
+Method 2. Derive from `PyCompositeSchedule`, providing extra functionalities like initialization
+
+```python
+class MultiLevelTiling(PyCompositeSchedule):
+    def initialize(...):
+        ...
+    def apply(...):
+        ...
+```
+
+**Customize exploration strategies in python**. Developers can implement any search algorithm in python as well by deriving from `PySearchPolicy`.
+
+**Other customizable components**. This list includes:
+
+* Cost model
+* Database

Review comment:
       it would be great to add the answers to these questions to the RFC rather than requiring a curious reader to find these in resolved conversations on the PR review. in particular, Database has never been mentioned anywhere in this RFC. you're clearly missing a system-level diagram, if this is the case.

##########
File path: rfcs/0001-meta-schedule-autotensorir.md
##########
@@ -0,0 +1,444 @@
+<!--- Licensed to the Apache Software Foundation (ASF) under one -->
+<!--- or more contributor license agreements.  See the NOTICE file -->
+<!--- distributed with this work for additional information -->
+<!--- regarding copyright ownership.  The ASF licenses this file -->
+<!--- to you under the Apache License, Version 2.0 (the -->
+<!--- "License"); you may not use this file except in compliance -->
+<!--- with the License.  You may obtain a copy of the License at -->
+
+<!---   http://www.apache.org/licenses/LICENSE-2.0 -->
+
+<!--- Unless required by applicable law or agreed to in writing, -->
+<!--- software distributed under the License is distributed on an -->
+<!--- "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY -->
+<!--- KIND, either express or implied.  See the License for the -->
+<!--- specific language governing permissions and limitations -->
+<!--- under the License. -->
+
+* Feature Name: Meta Schedule (Formerly AutoTIR)
+* Start Date: 2021-05-28
+* RFC PR: TBD (apache/tvm-rfcs#0000)
+* GitHub Issue: TBD (apache/tvm-rfcs#0000)
+
+## 1. Summary
+
+This proposal introduces Meta Schedule: a probabilistic scheduling DSL on TIR that unifies the
+approaches of AutoTVM and Auto Scheduler (Ansor). Meta schedule provides a pragmatic way to define
+the space of automatic tuning, extensibility in terms of all possible TIR schedule primitives like
+tensorization and loop partitioning, and customizability on every layer of the automation system.
+
+Meta Schedule is our 3rd generation automatic scheduling system.
+
+## 2. Motivation
+
+**Scheduling and design space.** In TVM TensorIR, optimization of a TensorIR program is done via a
+sequence of transformations. For example, we reorder loops for better locality and we tensorize for
+specific hardware intrinsics. The process of invoking such a set of pre-defined transformations is
+called "**scheduling**", and each transformation is called a "**schedule primitive**". These
+primitives form a domain-specific language (DSL) describing the transformation of TensorIR programs.
+**Design space** is the set of all possible schedulings with respect to a TensorIR program.
+
+**Problems with the current scheduling system.** Currently we have 3 sets of scheduling APIs:
+* **Manual schedule**: Developers optimize their programs by manually invoking schedule primitives,
+  i.e. explore points in the design space with humans in the loop. This can be a tedious and
+  error-prone approach, hence the creation of AutoTVM and AutoScheduler (Ansor).
+* **AutoTVM**: The automation system requires users to define "schedule templates" as the design
+  space for each operator. Therefore, it is inextensible to hundreds of operators and variety
+  hardware platforms.
+* **AutoScheduler (Ansor)**: It automatically generates schedule templates as the design space,
+  according to a set of predefined "search rules". However, it is non-trivial to extend
+  AutoScheduler to new schedule primitives (tensorize, loop partition, software pipelining, etc).
+* The three systems above have isolated sets of APIs with several layers of their own abstraction,
+  which are not only hard to learn, but also engineering-intensive to customize.
+
+**Benefits of Meta Schedule.**  Meta schedule provides:
+* Succinct syntax, consistent APIs to TensorIR schedule with no other layer of abstraction.
+* Unified APIs for implementing manual schedules, AutoTVM-style schedules, and AutoScheduler-style
+  schedules.
+* Extensibility of all the schedule primitives, including tensorization and loop partitioning.
+  Almost no extra effort is needed to use a new primitive in auto-tuning.
+* The automation infrastructure is extensible on every of its components. Every component of the
+  system can be customized easily in pure python or C++ or both. For example, one can develop a new
+  design space generator in python, a new ProgramRunner in python, etc.
+
+
+## 3. Guide-level explanation
+
+In this section, we describe the syntax of meta schedule DSL, and how it could be used to describe
+and auto-generate the design space.
+
+### 3.1. Manual Schedule
+
+Meta schedule APIs are almost the same as TE or TensorIR scheduling. Here is an example of a manual
+schedule for matrix multiplication:
+
+```python
+# Designate a set of tile sizes
+i_tiles = [16, 8, 8, 8]
+j_tiles = [16, 8, 8, 8]
+k_tiles = [256, 8]
+
+# Tile the loops according to the tile sizes
+i_0, i_1, i_2, i_3 = sch.split(loop=i, factors=i_tiles)
+j_0, j_1, j_2, j_3 = sch.split(loop=j, factors=j_tiles)
+k_0, k_1           = sch.split(loop=k, factors=k_tiles)
+
+# Organize the loops into "SSRSRS" 6-level tiles
+sch.reorder(
+    i_0, j_0, # S: the 1st spatial tile
+    i_1, j_1, # S: the 2nd spatial tile
+    k_0,      # R: the 1st reduction tile
+    i_2, j_2, # S: the 3rd spatial tile
+    k_1,      # R: the 2nd reduction tile
+    i_3, j_3, # S: the 4th spatial tile
+)
+```
+
+In this example, the developers may tweak the tile sizes and measure the performance of the
+generated kernels to explore the opportunities of potential optimization.
+
+Generally speaking, while writing a schedule, there are often some parameters that are hard to
+determine ahead of time, for example, tile sizes, unroll steps, or which tensor intrinsics to use.
+Developers may manually enumerate possible combinations of these unknown factors, and then pick the
+best schedule according to measurement results on their device.
+
+### 3.2. Composite Schedule Rules
+
+As introduced in the previous section, in TensorIR, each schedule primitive handles only a very
+basic transformation of the IR. For example, `split` only splits a loop into two new loops. In the
+real world, the over-fine granularity of those primitives usually leads to repetitive and verbose
+scheduling code, as
+[mentioned](https://discuss.tvm.apache.org/t/rfc-tensorir-a-schedulable-ir-for-tvm/7872/43?u=junrushao1994)
+by developers in our community.
+
+To make it more convenient and modular, we allow users to register "composite schedules" that apply
+a sequence of schedule primitives according to certain analysis of the IR. For instance, a composite
+schedule may inspect a TensorIR block and decide whether we should call `compute_inline` on it.
+
+### 3.3. AutoTVM-style Design Space Description
+
+Meta schedule extends the schedule DSL with sampling instructions. When included in a schedule,
+these instructions parametrize the schedule from a single deterministic point to a space supported
+by random variables (tile size, etc.), making it possible for developers to describe the design
+space with meta schedule APIs.
+
+We can extend the matmul example above to cover all possible tilings using these sampling
+instructions:
+
+```python
+# Sample tile sizes
+i_tiles = sch.sample_perfect_tile(i, n=4)
+j_tiles = sch.sample_perfect_tile(j, n=4)
+k_tiles = sch.sample_perfect_tile(k, n=2)
+# Tile the loops according to the random variables
+i_0, i_1, i_2, i_3 = sch.split(loop=i, factors=i_tiles)
+j_0, j_1, j_2, j_3 = sch.split(loop=j, factors=j_tiles)
+k_0, k_1           = sch.split(loop=k, factors=k_tiles)
+# Organize the loops into "SSRSRS" 6-level tiles
+sch.reorder(
+    i_0, j_0, # S: the 1st spatial tile
+    i_1, j_1, # S: the 2nd spatial tile
+    k_0,      # R: the 1st reduction tile
+    i_2, j_2, # S: the 3rd spatial tile
+    k_1,      # R: the 2nd reduction tile
+    i_3, j_3, # S: the 4th spatial tile
+)
+```
+
+### 3.4. AutoScheduler-style Design Space Generation
+
+AutoScheduler (Ansor) generates schedule templates by applying their SearchRules to each stage.
+SearchRule analyzes TE and eagerly trigger schedule primitives accordingly in its internally
+maintained mini IR.
+
+As introduced in Section 3.2, composite schedule rules are equivalent to AutoScheduler's SearchRule
+in TensorIR scheduling. To further generate a design space for scheduling, sampling instructions are
+used in composite schedule rules. Similarly, the sketch generation phase in AutoScheduler is
+equivalent to applying composite schedule rules to each block in TensorIR.
+
+Several built-in composite schedule rules are shipped with our system to align with Ansor's design
+space:
+
+* Multi-level tiling
+* Inline pure spatial blocks
+* Parallelize & vectorize & unroll
+* Auto tensorize
+
+Developers may implement their own rules in either Python or C++. They may specify which rules to
+use with the following syntax:
+
+```python
+from tvm import meta_schedule as ms
+
+design_space_generator = ms.PostOrderApply(rules=[
+    ms.MultiLevelTiling(...),
+    CustomRule(...),
+    ms.OtherBuiltinRules(...),
+])
+
+```
+
+### 3.5. Unifying manual schedule / AutoTVM / Ansor
+
+In this section, we show that the design space induced by TE manual schedule, AutoTVM and Ansor are
+all subsets of meta schedule, and meta schedule further allows mixing those three styles to search
+jointly.
+
+**Manual schedule**. The TE schedule is a special case of a meta schedule program, where there is no
+randomness introduced by sampling instructions. It is a single point in terms of design space.
+
+**AutoTVM (Template-based tuning)**. It is more natural representation of AutoTVM’s schedule
+templates (knobs) by writing one or more schedule functions in meta schedule with sampling
+instructions. The execution traces generated by the schedule functions are the design space to be
+explored.
+
+**AutoScheduler (Ansor, Template-free tuning)**. As mentioned in the previous section, application
+of composite schedule rules generates the design space, which is equivalent to Ansor’s sketch
+generation.
+
+**Mixing styles in design space definition**. By taking union of the spaces induced by the three
+special cases, our system allows developers to combine generic rules that Ansor provides and
+operator-specific scheduling.
+
+## 4. Reference-level explanation
+
+In this section, we introduce the underlying techniques for the automation system to extract and
+explore the design space. The figure below briefly illustrates the workflow of the system:
+
+![meta-schedule-workflow](../resources/meta-schedule-workflow.png)
+
+### 4.1. Execution trace as the design space
+
+**Trace**. To represent the design space defined by the meta schedule DSL, the underlying system
+records all the instructions users applied to the schedule class, including sampling and schedule
+primitives. We call this list of instructions a trace.
+
+For instance, executing the example above results in the following trace:
+
+```
+Instruction 0. Sample-Perfect-Tile
+Instruction 1. Sample-Perfect-Tile
+Instruction 2. Sample-Perfect-Tile
+Instruction 3. Split
+Instruction 4. Split
+Instruction 5. Split
+Instruction 6. Reorder
+```
+
+**Trace forms design space.** A trace may contain zero or more sampling instructions, which
+introduces the uncertainty in scheduling - one instance of sampling results in one point in the
+design space. Therefore, the trace itself forms a design space to explore, e.g. which set of tile
+sizes works best on a specific hardware.
+
+**Union of design space**. Our system works on a set of traces, representing the union of the design
+spaces represented by every single trace.
+
+**Fork a trace**. When two different decisions in the scheduling process are equally important to
+generate high-performance schedules, we allow forking the trace into two, and the design space is
+the union of the forked traces.
+
+The trace is not strictly user-facing, but can be accessed and printed with the following syntax:
+
+```python
+# requires to trace the execution
+sch = tir.Schedule(..., traced=True)
+# do a lot of scheduling
+...
+# print the trace
+print(sch.trace)
+```
+
+And below is an example of the printed trace, which honestly reflects the schedule as a snippet of
+python scheduling function:
+
+```python
+b0 = sch.get_block(name="matmul", func_name="main")
+l1, l2, l3 = sch.get_loops(block=b0)
+v4, v5, v6, v7 = sch.sample_perfect_tile(loop=l1, n=4, max_innermost_factor=16, decision=[32, 1, 16, 2])
+v8, v9, v10, v11 = sch.sample_perfect_tile(loop=l2, n=4, max_innermost_factor=16, decision=[64, 4, 2, 2])
+v12, v13 = sch.sample_perfect_tile(loop=l3, n=2, max_innermost_factor=16, decision=[64, 16])
+l14, l15, l16, l17 = sch.split(loop=l1, factors=[v4, v5, v6, v7])
+l18, l19, l20, l21 = sch.split(loop=l2, factors=[v8, v9, v10, v11])
+l22, l23 = sch.split(loop=l3, factors=[v12, v13])
+sch.reorder(l14, l18, l15, l19, l22, l16, l20, l23, l17, l21)
+```
+
+### 4.2. Exploring the Design Space
+
+Meta Schedule provides several built-in exploration strategies to exhaustively or efficiently search
+for efficient schedules.
+
+**Random search by replaying schedule functions.** With a user-provided schedule function
+as a black-box design space generator, our system could repetitively invoke such an opaque function
+without doing any extra analysis. The function could be written in C++ or Python, or any language
+that implements packed function FFI. If sampling instructions are present in the function, then each
+invocation results in a different IRModule after being scheduled because the random decisions are
+possibly changed across different runs. Effectively, it is equivalent to
+random exploration without trace, allowing the flexibility for users to define arbitrary functions
+that trace may not well support (e.g. control flow divergence based on the value of intermediate
+random variables), but it forbids future opportunity of any trace-based analysis.
+
+**Random search by replaying traces.** Traces are obtained from a design space generator, and
+replayed with a builtin interpreter in our system. If sampling instructions are present on the
+traces, then their random decisions are mutated during each replay, i.e. jumps to a new point in the
+design space. Therefore, repetitive replay of those traces are equivalent to exploration of the
+design space. Our system could potentially benefit from trace-based analysis, including rejecting
+obviously invalid schedules (e.g. using too much CUDA resources), doing dead-code elimination to
+simplify a trace, extracting trace-based features used in the cost model, etc.
+
+**Cost-model-guided evolutionary search**. A more efficient exploration strategy. We define two sets
+of rules:
+
+* Mutator: defines how to jump to a point’s "neighbor" in the design space
+* Postprocessor: after the trace is executed, there are some extra rules we want to execute, for
+  example:
+  * Check CUDA resource limits: There is a hard requirement in CUDA that the maximum number of
+    threads should not exceed 1024, but it is a random variable that cannot be determined before
+    actually executing the trace. In this case, we write a postprocessor that errors out when the
+    condition is not satisfied.
+  * Fuse outer loops until the extent of the fused loops is large enough: The number of outer loops
+    to be fused together depends on their extents, which are random variables. In this case, we
+    annotate the maximum extent allowed on the block, and do actual fusion in a postprocessor.
+
+Our evolutionary search algorithm uses mutators to find possible schedules in the design space, then
+applies postprocessors, and asks the cost model to predict its performance. After several
+iterations, the new schedules with the highest scores are finally compiled and measured on device.
+Epsilon-greedy is used in this process to balance exploitation and exploration.
+
+### 4.3. Python first for flexibility & customizability
+
+We implement the system in a way that all levels are decoupled and open to customization, aiming at
+providing a playground for developers to try out new ideas and potentially deliver performance
+quickly.
+
+While all the important APIs are implemented in C++ for efficiency, every part of the system can be
+easily switched to customized python implementation. For example,
+
+**Customize design space in python**. Can be a python function that does the schedule
+
+```python
+def schedule_matmul(sch) -> sch:
+    i, j, k = sch.get_loops(sch.get_block("matmul"))
+    i_tiles = sch.sample_perfect_tile(i, n=4)
+    j_tiles = sch.sample_perfect_tile(j, n=4)
+    k_tiles = sch.sample_perfect_tile(k, n=2)
+    # Tile the loops according to the random variables
+    i_0, i_1, i_2, i_3 = sch.split(loop=i, factors=i_tiles)
+    j_0, j_1, j_2, j_3 = sch.split(loop=j, factors=j_tiles)
+    k_0, k_1 = sch.split(loop=k, factors=k_tiles)
+    # Organize the loops into "SSRSRS" 6-level tiles
+    sch.reorder(
+        i_0, j_0, # S
+        i_1, j_1, # S
+        k_0,      # R
+        i_2, j_2, # S
+        k_1,      # R
+        i_3, j_3, # S
+    )
+    return sch
+```
+
+**Customize composite schedule in python**. We provide two ways to define a composite schedule in
+python:
+
+Method 1. Derive from `PyCompositeSchedule`, and implement two methods `initialize` and `apply`:
+
+```python
+class MultiLevelTiling(PyCompositeSchedule):
+    def initialize(...):
+        # initialize the class, usually this method is empty
+        ...
+
+    def apply(sch: Schedule, block: BlockRV) -> Union[Schedule, List[Schedule]]:
+        # write any python code, including:
+        # - analyze `block`
+        # - invoke schedule primitives in `sch`
+        # - do debug printing
+        ...
+```
+
+Method 2. A decorator as the syntactic sugar if the `initialize` method is empty, which converts the
+function to the `apply` method.
+
+```python
+@tir.as_composite_schedule(name="multi-level-tiling")
+def multi_level_tiling(sch: Schedule, block: BlockRV) -> Union[Schedule, List[Schedule]]:
+    # write any python code, including:
+    # - analyze `block`
+    # - invoke schedule primitives in `sch`
+    # - do debug printing
+    ...
+```
+
+**Customize exploration strategies in python**. Developers can implement any search algorithm in
+python as well by deriving from `PySearchPolicy`, and the syntax is identical to customizing with
+`PyCompositeSchedule`.
+
+**Other customizable components**. This list includes:
+
+* Cost model
+* Database
+* Measure callbacks
+* Feature extractor
+* Program builder & runner
+* Analysis methods
+* ...
+
+In a short summary, almost every component of the system is decoupled with each other and extensions
+could be easily plugged in.
+
+## 5. Drawbacks
+
+We are not aware of any drawbacks of the proposed system.
+
+## 6. Rationale and alternatives
+
+The system is designed with the principle of minimalism: different from alternative solutions, we do

Review comment:
       explain...

##########
File path: rfcs/0001-meta-schedule-autotensorir.md
##########
@@ -0,0 +1,444 @@
+<!--- Licensed to the Apache Software Foundation (ASF) under one -->
+<!--- or more contributor license agreements.  See the NOTICE file -->
+<!--- distributed with this work for additional information -->
+<!--- regarding copyright ownership.  The ASF licenses this file -->
+<!--- to you under the Apache License, Version 2.0 (the -->
+<!--- "License"); you may not use this file except in compliance -->
+<!--- with the License.  You may obtain a copy of the License at -->
+
+<!---   http://www.apache.org/licenses/LICENSE-2.0 -->
+
+<!--- Unless required by applicable law or agreed to in writing, -->
+<!--- software distributed under the License is distributed on an -->
+<!--- "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY -->
+<!--- KIND, either express or implied.  See the License for the -->
+<!--- specific language governing permissions and limitations -->
+<!--- under the License. -->
+
+* Feature Name: Meta Schedule (Formerly AutoTIR)
+* Start Date: 2021-05-28
+* RFC PR: TBD (apache/tvm-rfcs#0000)
+* GitHub Issue: TBD (apache/tvm-rfcs#0000)
+
+## 1. Summary
+
+This proposal introduces Meta Schedule: a probabilistic scheduling DSL on TIR that unifies the
+approaches of AutoTVM and Auto Scheduler (Ansor). Meta schedule provides a pragmatic way to define
+the space of automatic tuning, extensibility in terms of all possible TIR schedule primitives like
+tensorization and loop partitioning, and customizability on every layer of the automation system.
+
+Meta Schedule is our 3rd generation automatic scheduling system.
+
+## 2. Motivation
+
+**Scheduling and design space.** In TVM TensorIR, optimization of a TensorIR program is done via a
+sequence of transformations. For example, we reorder loops for better locality and we tensorize for
+specific hardware intrinsics. The process of invoking such a set of pre-defined transformations is
+called "**scheduling**", and each transformation is called a "**schedule primitive**". These
+primitives form a domain-specific language (DSL) describing the transformation of TensorIR programs.
+**Design space** is the set of all possible schedulings with respect to a TensorIR program.
+
+**Problems with the current scheduling system.** Currently we have 3 sets of scheduling APIs:
+* **Manual schedule**: Developers optimize their programs by manually invoking schedule primitives,
+  i.e. explore points in the design space with humans in the loop. This can be a tedious and
+  error-prone approach, hence the creation of AutoTVM and AutoScheduler (Ansor).
+* **AutoTVM**: The automation system requires users to define "schedule templates" as the design
+  space for each operator. Therefore, it is inextensible to hundreds of operators and variety
+  hardware platforms.
+* **AutoScheduler (Ansor)**: It automatically generates schedule templates as the design space,
+  according to a set of predefined "search rules". However, it is non-trivial to extend
+  AutoScheduler to new schedule primitives (tensorize, loop partition, software pipelining, etc).
+* The three systems above have isolated sets of APIs with several layers of their own abstraction,
+  which are not only hard to learn, but also engineering-intensive to customize.
+
+**Benefits of Meta Schedule.**  Meta schedule provides:
+* Succinct syntax, consistent APIs to TensorIR schedule with no other layer of abstraction.
+* Unified APIs for implementing manual schedules, AutoTVM-style schedules, and AutoScheduler-style
+  schedules.
+* Extensibility of all the schedule primitives, including tensorization and loop partitioning.
+  Almost no extra effort is needed to use a new primitive in auto-tuning.
+* The automation infrastructure is extensible on every of its components. Every component of the
+  system can be customized easily in pure python or C++ or both. For example, one can develop a new
+  design space generator in python, a new ProgramRunner in python, etc.
+
+
+## 3. Guide-level explanation
+
+In this section, we describe the syntax of meta schedule DSL, and how it could be used to describe
+and auto-generate the design space.
+
+### 3.1. Manual Schedule
+
+Meta schedule APIs are almost the same as TE or TensorIR scheduling. Here is an example of a manual
+schedule for matrix multiplication:
+
+```python
+# Designate a set of tile sizes
+i_tiles = [16, 8, 8, 8]
+j_tiles = [16, 8, 8, 8]
+k_tiles = [256, 8]
+
+# Tile the loops according to the tile sizes
+i_0, i_1, i_2, i_3 = sch.split(loop=i, factors=i_tiles)
+j_0, j_1, j_2, j_3 = sch.split(loop=j, factors=j_tiles)
+k_0, k_1           = sch.split(loop=k, factors=k_tiles)
+
+# Organize the loops into "SSRSRS" 6-level tiles
+sch.reorder(
+    i_0, j_0, # S: the 1st spatial tile
+    i_1, j_1, # S: the 2nd spatial tile
+    k_0,      # R: the 1st reduction tile
+    i_2, j_2, # S: the 3rd spatial tile
+    k_1,      # R: the 2nd reduction tile
+    i_3, j_3, # S: the 4th spatial tile
+)
+```
+
+In this example, the developers may tweak the tile sizes and measure the performance of the
+generated kernels to explore the opportunities of potential optimization.
+
+Generally speaking, while writing a schedule, there are often some parameters that are hard to
+determine ahead of time, for example, tile sizes, unroll steps, or which tensor intrinsics to use.
+Developers may manually enumerate possible combinations of these unknown factors, and then pick the
+best schedule according to measurement results on their device.
+
+### 3.2. Composite Schedule Rules
+
+As introduced in the previous section, in TensorIR, each schedule primitive handles only a very
+basic transformation of the IR. For example, `split` only splits a loop into two new loops. In the
+real world, the over-fine granularity of those primitives usually leads to repetitive and verbose
+scheduling code, as
+[mentioned](https://discuss.tvm.apache.org/t/rfc-tensorir-a-schedulable-ir-for-tvm/7872/43?u=junrushao1994)
+by developers in our community.
+
+To make it more convenient and modular, we allow users to register "composite schedules" that apply
+a sequence of schedule primitives according to certain analysis of the IR. For instance, a composite
+schedule may inspect a TensorIR block and decide whether we should call `compute_inline` on it.
+
+### 3.3. AutoTVM-style Design Space Description
+
+Meta schedule extends the schedule DSL with sampling instructions. When included in a schedule,
+these instructions parametrize the schedule from a single deterministic point to a space supported
+by random variables (tile size, etc.), making it possible for developers to describe the design
+space with meta schedule APIs.
+
+We can extend the matmul example above to cover all possible tilings using these sampling
+instructions:
+
+```python
+# Sample tile sizes
+i_tiles = sch.sample_perfect_tile(i, n=4)
+j_tiles = sch.sample_perfect_tile(j, n=4)
+k_tiles = sch.sample_perfect_tile(k, n=2)
+# Tile the loops according to the random variables
+i_0, i_1, i_2, i_3 = sch.split(loop=i, factors=i_tiles)
+j_0, j_1, j_2, j_3 = sch.split(loop=j, factors=j_tiles)
+k_0, k_1           = sch.split(loop=k, factors=k_tiles)
+# Organize the loops into "SSRSRS" 6-level tiles
+sch.reorder(
+    i_0, j_0, # S: the 1st spatial tile
+    i_1, j_1, # S: the 2nd spatial tile
+    k_0,      # R: the 1st reduction tile
+    i_2, j_2, # S: the 3rd spatial tile
+    k_1,      # R: the 2nd reduction tile
+    i_3, j_3, # S: the 4th spatial tile
+)
+```
+
+### 3.4. AutoScheduler-style Design Space Generation
+
+AutoScheduler (Ansor) generates schedule templates by applying their SearchRules to each stage.
+SearchRule analyzes TE and eagerly trigger schedule primitives accordingly in its internally
+maintained mini IR.
+
+As introduced in Section 3.2, composite schedule rules are equivalent to AutoScheduler's SearchRule
+in TensorIR scheduling. To further generate a design space for scheduling, sampling instructions are
+used in composite schedule rules. Similarly, the sketch generation phase in AutoScheduler is
+equivalent to applying composite schedule rules to each block in TensorIR.
+
+Several built-in composite schedule rules are shipped with our system to align with Ansor's design
+space:
+
+* Multi-level tiling
+* Inline pure spatial blocks
+* Parallelize & vectorize & unroll
+* Auto tensorize
+
+Developers may implement their own rules in either Python or C++. They may specify which rules to
+use with the following syntax:
+
+```python
+from tvm import meta_schedule as ms
+
+design_space_generator = ms.PostOrderApply(rules=[
+    ms.MultiLevelTiling(...),
+    CustomRule(...),
+    ms.OtherBuiltinRules(...),
+])
+
+```
+
+### 3.5. Unifying manual schedule / AutoTVM / Ansor
+
+In this section, we show that the design space induced by TE manual schedule, AutoTVM and Ansor are
+all subsets of meta schedule, and meta schedule further allows mixing those three styles to search
+jointly.
+
+**Manual schedule**. The TE schedule is a special case of a meta schedule program, where there is no
+randomness introduced by sampling instructions. It is a single point in terms of design space.
+
+**AutoTVM (Template-based tuning)**. It is more natural representation of AutoTVM’s schedule
+templates (knobs) by writing one or more schedule functions in meta schedule with sampling
+instructions. The execution traces generated by the schedule functions are the design space to be
+explored.
+
+**AutoScheduler (Ansor, Template-free tuning)**. As mentioned in the previous section, application
+of composite schedule rules generates the design space, which is equivalent to Ansor’s sketch
+generation.
+
+**Mixing styles in design space definition**. By taking union of the spaces induced by the three
+special cases, our system allows developers to combine generic rules that Ansor provides and
+operator-specific scheduling.
+
+## 4. Reference-level explanation
+
+In this section, we introduce the underlying techniques for the automation system to extract and
+explore the design space. The figure below briefly illustrates the workflow of the system:
+
+![meta-schedule-workflow](../resources/meta-schedule-workflow.png)
+
+### 4.1. Execution trace as the design space
+
+**Trace**. To represent the design space defined by the meta schedule DSL, the underlying system
+records all the instructions users applied to the schedule class, including sampling and schedule
+primitives. We call this list of instructions a trace.
+
+For instance, executing the example above results in the following trace:
+
+```
+Instruction 0. Sample-Perfect-Tile
+Instruction 1. Sample-Perfect-Tile
+Instruction 2. Sample-Perfect-Tile
+Instruction 3. Split
+Instruction 4. Split
+Instruction 5. Split
+Instruction 6. Reorder
+```
+
+**Trace forms design space.** A trace may contain zero or more sampling instructions, which
+introduces the uncertainty in scheduling - one instance of sampling results in one point in the
+design space. Therefore, the trace itself forms a design space to explore, e.g. which set of tile
+sizes works best on a specific hardware.
+
+**Union of design space**. Our system works on a set of traces, representing the union of the design
+spaces represented by every single trace.
+
+**Fork a trace**. When two different decisions in the scheduling process are equally important to
+generate high-performance schedules, we allow forking the trace into two, and the design space is
+the union of the forked traces.
+
+The trace is not strictly user-facing, but can be accessed and printed with the following syntax:
+
+```python
+# requires to trace the execution
+sch = tir.Schedule(..., traced=True)
+# do a lot of scheduling
+...
+# print the trace
+print(sch.trace)
+```
+
+And below is an example of the printed trace, which honestly reflects the schedule as a snippet of
+python scheduling function:
+
+```python
+b0 = sch.get_block(name="matmul", func_name="main")
+l1, l2, l3 = sch.get_loops(block=b0)
+v4, v5, v6, v7 = sch.sample_perfect_tile(loop=l1, n=4, max_innermost_factor=16, decision=[32, 1, 16, 2])
+v8, v9, v10, v11 = sch.sample_perfect_tile(loop=l2, n=4, max_innermost_factor=16, decision=[64, 4, 2, 2])
+v12, v13 = sch.sample_perfect_tile(loop=l3, n=2, max_innermost_factor=16, decision=[64, 16])
+l14, l15, l16, l17 = sch.split(loop=l1, factors=[v4, v5, v6, v7])
+l18, l19, l20, l21 = sch.split(loop=l2, factors=[v8, v9, v10, v11])
+l22, l23 = sch.split(loop=l3, factors=[v12, v13])
+sch.reorder(l14, l18, l15, l19, l22, l16, l20, l23, l17, l21)
+```
+
+### 4.2. Exploring the Design Space
+
+Meta Schedule provides several built-in exploration strategies to exhaustively or efficiently search
+for efficient schedules.
+
+**Random search by replaying schedule functions.** With a user-provided schedule function
+as a black-box design space generator, our system could repetitively invoke such an opaque function
+without doing any extra analysis. The function could be written in C++ or Python, or any language
+that implements packed function FFI. If sampling instructions are present in the function, then each
+invocation results in a different IRModule after being scheduled because the random decisions are
+possibly changed across different runs. Effectively, it is equivalent to
+random exploration without trace, allowing the flexibility for users to define arbitrary functions
+that trace may not well support (e.g. control flow divergence based on the value of intermediate
+random variables), but it forbids future opportunity of any trace-based analysis.
+
+**Random search by replaying traces.** Traces are obtained from a design space generator, and
+replayed with a builtin interpreter in our system. If sampling instructions are present on the
+traces, then their random decisions are mutated during each replay, i.e. jumps to a new point in the
+design space. Therefore, repetitive replay of those traces are equivalent to exploration of the
+design space. Our system could potentially benefit from trace-based analysis, including rejecting
+obviously invalid schedules (e.g. using too much CUDA resources), doing dead-code elimination to
+simplify a trace, extracting trace-based features used in the cost model, etc.
+
+**Cost-model-guided evolutionary search**. A more efficient exploration strategy. We define two sets
+of rules:
+
+* Mutator: defines how to jump to a point’s "neighbor" in the design space
+* Postprocessor: after the trace is executed, there are some extra rules we want to execute, for
+  example:
+  * Check CUDA resource limits: There is a hard requirement in CUDA that the maximum number of
+    threads should not exceed 1024, but it is a random variable that cannot be determined before
+    actually executing the trace. In this case, we write a postprocessor that errors out when the
+    condition is not satisfied.
+  * Fuse outer loops until the extent of the fused loops is large enough: The number of outer loops
+    to be fused together depends on their extents, which are random variables. In this case, we
+    annotate the maximum extent allowed on the block, and do actual fusion in a postprocessor.
+
+Our evolutionary search algorithm uses mutators to find possible schedules in the design space, then
+applies postprocessors, and asks the cost model to predict its performance. After several
+iterations, the new schedules with the highest scores are finally compiled and measured on device.
+Epsilon-greedy is used in this process to balance exploitation and exploration.
+
+### 4.3. Python first for flexibility & customizability
+
+We implement the system in a way that all levels are decoupled and open to customization, aiming at
+providing a playground for developers to try out new ideas and potentially deliver performance
+quickly.
+
+While all the important APIs are implemented in C++ for efficiency, every part of the system can be
+easily switched to customized python implementation. For example,
+
+**Customize design space in python**. Can be a python function that does the schedule
+
+```python
+def schedule_matmul(sch) -> sch:
+    i, j, k = sch.get_loops(sch.get_block("matmul"))
+    i_tiles = sch.sample_perfect_tile(i, n=4)
+    j_tiles = sch.sample_perfect_tile(j, n=4)
+    k_tiles = sch.sample_perfect_tile(k, n=2)
+    # Tile the loops according to the random variables
+    i_0, i_1, i_2, i_3 = sch.split(loop=i, factors=i_tiles)
+    j_0, j_1, j_2, j_3 = sch.split(loop=j, factors=j_tiles)
+    k_0, k_1 = sch.split(loop=k, factors=k_tiles)
+    # Organize the loops into "SSRSRS" 6-level tiles
+    sch.reorder(
+        i_0, j_0, # S
+        i_1, j_1, # S
+        k_0,      # R
+        i_2, j_2, # S
+        k_1,      # R
+        i_3, j_3, # S
+    )
+    return sch
+```
+
+**Customize composite schedule in python**. We provide two ways to define a composite schedule in
+python:
+
+Method 1. Derive from `PyCompositeSchedule`, and implement two methods `initialize` and `apply`:
+
+```python
+class MultiLevelTiling(PyCompositeSchedule):
+    def initialize(...):
+        # initialize the class, usually this method is empty
+        ...
+
+    def apply(sch: Schedule, block: BlockRV) -> Union[Schedule, List[Schedule]]:
+        # write any python code, including:
+        # - analyze `block`
+        # - invoke schedule primitives in `sch`
+        # - do debug printing
+        ...
+```
+
+Method 2. A decorator as the syntactic sugar if the `initialize` method is empty, which converts the
+function to the `apply` method.
+
+```python
+@tir.as_composite_schedule(name="multi-level-tiling")
+def multi_level_tiling(sch: Schedule, block: BlockRV) -> Union[Schedule, List[Schedule]]:
+    # write any python code, including:
+    # - analyze `block`
+    # - invoke schedule primitives in `sch`
+    # - do debug printing
+    ...
+```
+
+**Customize exploration strategies in python**. Developers can implement any search algorithm in
+python as well by deriving from `PySearchPolicy`, and the syntax is identical to customizing with
+`PyCompositeSchedule`.
+
+**Other customizable components**. This list includes:
+
+* Cost model
+* Database
+* Measure callbacks
+* Feature extractor
+* Program builder & runner
+* Analysis methods
+* ...
+
+In a short summary, almost every component of the system is decoupled with each other and extensions
+could be easily plugged in.
+
+## 5. Drawbacks
+
+We are not aware of any drawbacks of the proposed system.
+
+## 6. Rationale and alternatives
+
+The system is designed with the principle of minimalism: different from alternative solutions, we do
+not require any change in existing codebase, or extra APIs to learn. It could potentially lower the
+bar of using automation systems. 
+
+Unifying manual scheduling, AutoTVM's semi automatic templates and AutoScheduler's (Ansor's) fully
+automatic sketch generation provides flexible way to balance injection new domain knowledge and
+automation.
+
+Flexibility in customization allows quick try-out on new tasks, new strategies and new hardware
+targets without deep knowledge of the system.
+
+## 7. Prior art
+
+**Tensor Expression (TE)** in TVM is a DSL that decouples compute and schedule, which provides
+convenient ways to handcraft optimized kernels for different hardware targets.
+
+**TensorIR** is the latest generation of TVM’s low-level IR. Its capability of eagerly applying
+schedule primitives opens the door for meta schedule, our proposed new-generation auto scheduling
+system.
+
+**AutoTVM** is the 1st generation automation framework in TVM, which requires developers to
+implement per-operator scheduling templates, and the system could handle the tuning process.
+
+**AutoScheduler (Ansor)** is the 2nd generation automation framework in TVM, whose built-in rules
+could automatically generate schedule templates for almost all the operators on CPU, GPU, etc.
+
+## 8. Unresolved questions
+
+**Supporting Control Flow and Assertions**
+
+Right now the meta schedule DSL does not support control flow. Although we didn’t see any real-world
+use case right now, it is possible that it could appear in some future workloads.
+
+A real-world issue we could see is that sampling may lead to wrong schedules on CUDA, e.g. the
+schedule results in a CUDA program that uses too much shared memory, too many threads, etc. In this
+case, we need to halt the program immediately. Therefore, introducing assertion may be helpful.
+
+## 9. Future possibilities
+
+**Unifying Manual Scheduling, AutoTVM and Ansor in TOPI**
+
+Meta schedule provides an idiomatic approach to unify the three existing scheduling APIs in TVM:
+
+* Manual schedules are meta schedules without sampling instructions
+* AutoTVM templates are meta schedules where knobs are replaced by sampling instructions
+* Each of Ansor’s search rules generates a snippet of a meta schedule
+
+We further allow mixing different styles of scheduling and exploring the union space, which could

Review comment:
       what's the relation to TOPI? 

##########
File path: rfcs/0001-meta-schedule-autotensorir.md
##########
@@ -0,0 +1,444 @@
+<!--- Licensed to the Apache Software Foundation (ASF) under one -->
+<!--- or more contributor license agreements.  See the NOTICE file -->
+<!--- distributed with this work for additional information -->
+<!--- regarding copyright ownership.  The ASF licenses this file -->
+<!--- to you under the Apache License, Version 2.0 (the -->
+<!--- "License"); you may not use this file except in compliance -->
+<!--- with the License.  You may obtain a copy of the License at -->
+
+<!---   http://www.apache.org/licenses/LICENSE-2.0 -->
+
+<!--- Unless required by applicable law or agreed to in writing, -->
+<!--- software distributed under the License is distributed on an -->
+<!--- "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY -->
+<!--- KIND, either express or implied.  See the License for the -->
+<!--- specific language governing permissions and limitations -->
+<!--- under the License. -->
+
+* Feature Name: Meta Schedule (Formerly AutoTIR)
+* Start Date: 2021-05-28
+* RFC PR: TBD (apache/tvm-rfcs#0000)
+* GitHub Issue: TBD (apache/tvm-rfcs#0000)
+
+## 1. Summary
+
+This proposal introduces Meta Schedule: a probabilistic scheduling DSL on TIR that unifies the
+approaches of AutoTVM and Auto Scheduler (Ansor). Meta schedule provides a pragmatic way to define
+the space of automatic tuning, extensibility in terms of all possible TIR schedule primitives like
+tensorization and loop partitioning, and customizability on every layer of the automation system.
+
+Meta Schedule is our 3rd generation automatic scheduling system.
+
+## 2. Motivation
+
+**Scheduling and design space.** In TVM TensorIR, optimization of a TensorIR program is done via a
+sequence of transformations. For example, we reorder loops for better locality and we tensorize for
+specific hardware intrinsics. The process of invoking such a set of pre-defined transformations is
+called "**scheduling**", and each transformation is called a "**schedule primitive**". These
+primitives form a domain-specific language (DSL) describing the transformation of TensorIR programs.
+**Design space** is the set of all possible schedulings with respect to a TensorIR program.
+
+**Problems with the current scheduling system.** Currently we have 3 sets of scheduling APIs:
+* **Manual schedule**: Developers optimize their programs by manually invoking schedule primitives,
+  i.e. explore points in the design space with humans in the loop. This can be a tedious and
+  error-prone approach, hence the creation of AutoTVM and AutoScheduler (Ansor).
+* **AutoTVM**: The automation system requires users to define "schedule templates" as the design
+  space for each operator. Therefore, it is inextensible to hundreds of operators and variety
+  hardware platforms.
+* **AutoScheduler (Ansor)**: It automatically generates schedule templates as the design space,
+  according to a set of predefined "search rules". However, it is non-trivial to extend
+  AutoScheduler to new schedule primitives (tensorize, loop partition, software pipelining, etc).
+* The three systems above have isolated sets of APIs with several layers of their own abstraction,
+  which are not only hard to learn, but also engineering-intensive to customize.
+
+**Benefits of Meta Schedule.**  Meta schedule provides:
+* Succinct syntax, consistent APIs to TensorIR schedule with no other layer of abstraction.
+* Unified APIs for implementing manual schedules, AutoTVM-style schedules, and AutoScheduler-style
+  schedules.
+* Extensibility of all the schedule primitives, including tensorization and loop partitioning.
+  Almost no extra effort is needed to use a new primitive in auto-tuning.
+* The automation infrastructure is extensible on every of its components. Every component of the
+  system can be customized easily in pure python or C++ or both. For example, one can develop a new
+  design space generator in python, a new ProgramRunner in python, etc.
+
+
+## 3. Guide-level explanation
+
+In this section, we describe the syntax of meta schedule DSL, and how it could be used to describe
+and auto-generate the design space.
+
+### 3.1. Manual Schedule
+
+Meta schedule APIs are almost the same as TE or TensorIR scheduling. Here is an example of a manual
+schedule for matrix multiplication:
+
+```python
+# Designate a set of tile sizes
+i_tiles = [16, 8, 8, 8]
+j_tiles = [16, 8, 8, 8]
+k_tiles = [256, 8]
+
+# Tile the loops according to the tile sizes
+i_0, i_1, i_2, i_3 = sch.split(loop=i, factors=i_tiles)
+j_0, j_1, j_2, j_3 = sch.split(loop=j, factors=j_tiles)
+k_0, k_1           = sch.split(loop=k, factors=k_tiles)
+
+# Organize the loops into "SSRSRS" 6-level tiles
+sch.reorder(
+    i_0, j_0, # S: the 1st spatial tile
+    i_1, j_1, # S: the 2nd spatial tile
+    k_0,      # R: the 1st reduction tile
+    i_2, j_2, # S: the 3rd spatial tile
+    k_1,      # R: the 2nd reduction tile
+    i_3, j_3, # S: the 4th spatial tile
+)
+```
+
+In this example, the developers may tweak the tile sizes and measure the performance of the
+generated kernels to explore the opportunities of potential optimization.
+
+Generally speaking, while writing a schedule, there are often some parameters that are hard to
+determine ahead of time, for example, tile sizes, unroll steps, or which tensor intrinsics to use.
+Developers may manually enumerate possible combinations of these unknown factors, and then pick the
+best schedule according to measurement results on their device.
+
+### 3.2. Composite Schedule Rules
+
+As introduced in the previous section, in TensorIR, each schedule primitive handles only a very
+basic transformation of the IR. For example, `split` only splits a loop into two new loops. In the
+real world, the over-fine granularity of those primitives usually leads to repetitive and verbose
+scheduling code, as
+[mentioned](https://discuss.tvm.apache.org/t/rfc-tensorir-a-schedulable-ir-for-tvm/7872/43?u=junrushao1994)
+by developers in our community.
+
+To make it more convenient and modular, we allow users to register "composite schedules" that apply
+a sequence of schedule primitives according to certain analysis of the IR. For instance, a composite
+schedule may inspect a TensorIR block and decide whether we should call `compute_inline` on it.
+
+### 3.3. AutoTVM-style Design Space Description
+
+Meta schedule extends the schedule DSL with sampling instructions. When included in a schedule,
+these instructions parametrize the schedule from a single deterministic point to a space supported
+by random variables (tile size, etc.), making it possible for developers to describe the design
+space with meta schedule APIs.
+
+We can extend the matmul example above to cover all possible tilings using these sampling
+instructions:
+
+```python
+# Sample tile sizes
+i_tiles = sch.sample_perfect_tile(i, n=4)
+j_tiles = sch.sample_perfect_tile(j, n=4)
+k_tiles = sch.sample_perfect_tile(k, n=2)
+# Tile the loops according to the random variables
+i_0, i_1, i_2, i_3 = sch.split(loop=i, factors=i_tiles)
+j_0, j_1, j_2, j_3 = sch.split(loop=j, factors=j_tiles)
+k_0, k_1           = sch.split(loop=k, factors=k_tiles)
+# Organize the loops into "SSRSRS" 6-level tiles
+sch.reorder(
+    i_0, j_0, # S: the 1st spatial tile
+    i_1, j_1, # S: the 2nd spatial tile
+    k_0,      # R: the 1st reduction tile
+    i_2, j_2, # S: the 3rd spatial tile
+    k_1,      # R: the 2nd reduction tile
+    i_3, j_3, # S: the 4th spatial tile
+)
+```
+
+### 3.4. AutoScheduler-style Design Space Generation
+
+AutoScheduler (Ansor) generates schedule templates by applying their SearchRules to each stage.
+SearchRule analyzes TE and eagerly trigger schedule primitives accordingly in its internally
+maintained mini IR.
+
+As introduced in Section 3.2, composite schedule rules are equivalent to AutoScheduler's SearchRule
+in TensorIR scheduling. To further generate a design space for scheduling, sampling instructions are
+used in composite schedule rules. Similarly, the sketch generation phase in AutoScheduler is
+equivalent to applying composite schedule rules to each block in TensorIR.
+
+Several built-in composite schedule rules are shipped with our system to align with Ansor's design
+space:
+
+* Multi-level tiling
+* Inline pure spatial blocks
+* Parallelize & vectorize & unroll
+* Auto tensorize
+
+Developers may implement their own rules in either Python or C++. They may specify which rules to
+use with the following syntax:
+
+```python
+from tvm import meta_schedule as ms
+
+design_space_generator = ms.PostOrderApply(rules=[
+    ms.MultiLevelTiling(...),
+    CustomRule(...),
+    ms.OtherBuiltinRules(...),
+])
+
+```
+
+### 3.5. Unifying manual schedule / AutoTVM / Ansor
+
+In this section, we show that the design space induced by TE manual schedule, AutoTVM and Ansor are
+all subsets of meta schedule, and meta schedule further allows mixing those three styles to search
+jointly.
+
+**Manual schedule**. The TE schedule is a special case of a meta schedule program, where there is no
+randomness introduced by sampling instructions. It is a single point in terms of design space.
+
+**AutoTVM (Template-based tuning)**. It is more natural representation of AutoTVM’s schedule
+templates (knobs) by writing one or more schedule functions in meta schedule with sampling
+instructions. The execution traces generated by the schedule functions are the design space to be
+explored.
+
+**AutoScheduler (Ansor, Template-free tuning)**. As mentioned in the previous section, application
+of composite schedule rules generates the design space, which is equivalent to Ansor’s sketch
+generation.
+
+**Mixing styles in design space definition**. By taking union of the spaces induced by the three
+special cases, our system allows developers to combine generic rules that Ansor provides and
+operator-specific scheduling.
+
+## 4. Reference-level explanation
+
+In this section, we introduce the underlying techniques for the automation system to extract and
+explore the design space. The figure below briefly illustrates the workflow of the system:
+
+![meta-schedule-workflow](../resources/meta-schedule-workflow.png)
+
+### 4.1. Execution trace as the design space
+
+**Trace**. To represent the design space defined by the meta schedule DSL, the underlying system
+records all the instructions users applied to the schedule class, including sampling and schedule
+primitives. We call this list of instructions a trace.
+
+For instance, executing the example above results in the following trace:
+
+```
+Instruction 0. Sample-Perfect-Tile
+Instruction 1. Sample-Perfect-Tile
+Instruction 2. Sample-Perfect-Tile
+Instruction 3. Split
+Instruction 4. Split
+Instruction 5. Split
+Instruction 6. Reorder
+```
+
+**Trace forms design space.** A trace may contain zero or more sampling instructions, which
+introduces the uncertainty in scheduling - one instance of sampling results in one point in the
+design space. Therefore, the trace itself forms a design space to explore, e.g. which set of tile
+sizes works best on a specific hardware.
+
+**Union of design space**. Our system works on a set of traces, representing the union of the design
+spaces represented by every single trace.
+
+**Fork a trace**. When two different decisions in the scheduling process are equally important to
+generate high-performance schedules, we allow forking the trace into two, and the design space is
+the union of the forked traces.
+
+The trace is not strictly user-facing, but can be accessed and printed with the following syntax:
+
+```python
+# requires to trace the execution
+sch = tir.Schedule(..., traced=True)
+# do a lot of scheduling
+...
+# print the trace
+print(sch.trace)
+```
+
+And below is an example of the printed trace, which honestly reflects the schedule as a snippet of
+python scheduling function:
+
+```python
+b0 = sch.get_block(name="matmul", func_name="main")
+l1, l2, l3 = sch.get_loops(block=b0)
+v4, v5, v6, v7 = sch.sample_perfect_tile(loop=l1, n=4, max_innermost_factor=16, decision=[32, 1, 16, 2])
+v8, v9, v10, v11 = sch.sample_perfect_tile(loop=l2, n=4, max_innermost_factor=16, decision=[64, 4, 2, 2])
+v12, v13 = sch.sample_perfect_tile(loop=l3, n=2, max_innermost_factor=16, decision=[64, 16])
+l14, l15, l16, l17 = sch.split(loop=l1, factors=[v4, v5, v6, v7])
+l18, l19, l20, l21 = sch.split(loop=l2, factors=[v8, v9, v10, v11])
+l22, l23 = sch.split(loop=l3, factors=[v12, v13])
+sch.reorder(l14, l18, l15, l19, l22, l16, l20, l23, l17, l21)
+```
+
+### 4.2. Exploring the Design Space
+
+Meta Schedule provides several built-in exploration strategies to exhaustively or efficiently search
+for efficient schedules.
+
+**Random search by replaying schedule functions.** With a user-provided schedule function
+as a black-box design space generator, our system could repetitively invoke such an opaque function
+without doing any extra analysis. The function could be written in C++ or Python, or any language
+that implements packed function FFI. If sampling instructions are present in the function, then each
+invocation results in a different IRModule after being scheduled because the random decisions are
+possibly changed across different runs. Effectively, it is equivalent to
+random exploration without trace, allowing the flexibility for users to define arbitrary functions
+that trace may not well support (e.g. control flow divergence based on the value of intermediate
+random variables), but it forbids future opportunity of any trace-based analysis.
+
+**Random search by replaying traces.** Traces are obtained from a design space generator, and
+replayed with a builtin interpreter in our system. If sampling instructions are present on the
+traces, then their random decisions are mutated during each replay, i.e. jumps to a new point in the
+design space. Therefore, repetitive replay of those traces are equivalent to exploration of the
+design space. Our system could potentially benefit from trace-based analysis, including rejecting
+obviously invalid schedules (e.g. using too much CUDA resources), doing dead-code elimination to
+simplify a trace, extracting trace-based features used in the cost model, etc.
+
+**Cost-model-guided evolutionary search**. A more efficient exploration strategy. We define two sets
+of rules:
+
+* Mutator: defines how to jump to a point’s "neighbor" in the design space
+* Postprocessor: after the trace is executed, there are some extra rules we want to execute, for
+  example:
+  * Check CUDA resource limits: There is a hard requirement in CUDA that the maximum number of
+    threads should not exceed 1024, but it is a random variable that cannot be determined before
+    actually executing the trace. In this case, we write a postprocessor that errors out when the
+    condition is not satisfied.
+  * Fuse outer loops until the extent of the fused loops is large enough: The number of outer loops
+    to be fused together depends on their extents, which are random variables. In this case, we
+    annotate the maximum extent allowed on the block, and do actual fusion in a postprocessor.
+
+Our evolutionary search algorithm uses mutators to find possible schedules in the design space, then
+applies postprocessors, and asks the cost model to predict its performance. After several
+iterations, the new schedules with the highest scores are finally compiled and measured on device.
+Epsilon-greedy is used in this process to balance exploitation and exploration.
+
+### 4.3. Python first for flexibility & customizability
+
+We implement the system in a way that all levels are decoupled and open to customization, aiming at
+providing a playground for developers to try out new ideas and potentially deliver performance
+quickly.
+
+While all the important APIs are implemented in C++ for efficiency, every part of the system can be
+easily switched to customized python implementation. For example,
+
+**Customize design space in python**. Can be a python function that does the schedule
+
+```python
+def schedule_matmul(sch) -> sch:
+    i, j, k = sch.get_loops(sch.get_block("matmul"))
+    i_tiles = sch.sample_perfect_tile(i, n=4)
+    j_tiles = sch.sample_perfect_tile(j, n=4)
+    k_tiles = sch.sample_perfect_tile(k, n=2)
+    # Tile the loops according to the random variables
+    i_0, i_1, i_2, i_3 = sch.split(loop=i, factors=i_tiles)
+    j_0, j_1, j_2, j_3 = sch.split(loop=j, factors=j_tiles)
+    k_0, k_1 = sch.split(loop=k, factors=k_tiles)
+    # Organize the loops into "SSRSRS" 6-level tiles
+    sch.reorder(
+        i_0, j_0, # S
+        i_1, j_1, # S
+        k_0,      # R
+        i_2, j_2, # S
+        k_1,      # R
+        i_3, j_3, # S
+    )
+    return sch
+```
+
+**Customize composite schedule in python**. We provide two ways to define a composite schedule in
+python:
+
+Method 1. Derive from `PyCompositeSchedule`, and implement two methods `initialize` and `apply`:
+
+```python
+class MultiLevelTiling(PyCompositeSchedule):
+    def initialize(...):
+        # initialize the class, usually this method is empty
+        ...
+
+    def apply(sch: Schedule, block: BlockRV) -> Union[Schedule, List[Schedule]]:
+        # write any python code, including:
+        # - analyze `block`
+        # - invoke schedule primitives in `sch`
+        # - do debug printing
+        ...
+```
+
+Method 2. A decorator as the syntactic sugar if the `initialize` method is empty, which converts the
+function to the `apply` method.
+
+```python
+@tir.as_composite_schedule(name="multi-level-tiling")
+def multi_level_tiling(sch: Schedule, block: BlockRV) -> Union[Schedule, List[Schedule]]:
+    # write any python code, including:
+    # - analyze `block`
+    # - invoke schedule primitives in `sch`
+    # - do debug printing
+    ...
+```
+
+**Customize exploration strategies in python**. Developers can implement any search algorithm in
+python as well by deriving from `PySearchPolicy`, and the syntax is identical to customizing with
+`PyCompositeSchedule`.
+
+**Other customizable components**. This list includes:
+
+* Cost model
+* Database
+* Measure callbacks
+* Feature extractor
+* Program builder & runner
+* Analysis methods
+* ...
+
+In a short summary, almost every component of the system is decoupled with each other and extensions
+could be easily plugged in.
+
+## 5. Drawbacks
+
+We are not aware of any drawbacks of the proposed system.
+
+## 6. Rationale and alternatives
+
+The system is designed with the principle of minimalism: different from alternative solutions, we do
+not require any change in existing codebase, or extra APIs to learn. It could potentially lower the
+bar of using automation systems. 
+
+Unifying manual scheduling, AutoTVM's semi automatic templates and AutoScheduler's (Ansor's) fully
+automatic sketch generation provides flexible way to balance injection new domain knowledge and
+automation.
+
+Flexibility in customization allows quick try-out on new tasks, new strategies and new hardware
+targets without deep knowledge of the system.
+
+## 7. Prior art
+
+**Tensor Expression (TE)** in TVM is a DSL that decouples compute and schedule, which provides
+convenient ways to handcraft optimized kernels for different hardware targets.
+
+**TensorIR** is the latest generation of TVM’s low-level IR. Its capability of eagerly applying

Review comment:
       is this different from TIR? I don't think you should make a v2 of TIR called TensorIR, because I already assumed TIR was short for TensorIR in previous materials.

##########
File path: rfcs/0001-meta-schedule-autotensorir.md
##########
@@ -0,0 +1,444 @@
+<!--- Licensed to the Apache Software Foundation (ASF) under one -->
+<!--- or more contributor license agreements.  See the NOTICE file -->
+<!--- distributed with this work for additional information -->
+<!--- regarding copyright ownership.  The ASF licenses this file -->
+<!--- to you under the Apache License, Version 2.0 (the -->
+<!--- "License"); you may not use this file except in compliance -->
+<!--- with the License.  You may obtain a copy of the License at -->
+
+<!---   http://www.apache.org/licenses/LICENSE-2.0 -->
+
+<!--- Unless required by applicable law or agreed to in writing, -->
+<!--- software distributed under the License is distributed on an -->
+<!--- "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY -->
+<!--- KIND, either express or implied.  See the License for the -->
+<!--- specific language governing permissions and limitations -->
+<!--- under the License. -->
+
+* Feature Name: Meta Schedule (Formerly AutoTIR)
+* Start Date: 2021-05-28
+* RFC PR: TBD (apache/tvm-rfcs#0000)
+* GitHub Issue: TBD (apache/tvm-rfcs#0000)
+
+## 1. Summary
+
+This proposal introduces Meta Schedule: a probabilistic scheduling DSL on TIR that unifies the
+approaches of AutoTVM and Auto Scheduler (Ansor). Meta schedule provides a pragmatic way to define
+the space of automatic tuning, extensibility in terms of all possible TIR schedule primitives like
+tensorization and loop partitioning, and customizability on every layer of the automation system.
+
+Meta Schedule is our 3rd generation automatic scheduling system.
+
+## 2. Motivation
+
+**Scheduling and design space.** In TVM TensorIR, optimization of a TensorIR program is done via a
+sequence of transformations. For example, we reorder loops for better locality and we tensorize for
+specific hardware intrinsics. The process of invoking such a set of pre-defined transformations is
+called "**scheduling**", and each transformation is called a "**schedule primitive**". These
+primitives form a domain-specific language (DSL) describing the transformation of TensorIR programs.
+**Design space** is the set of all possible schedulings with respect to a TensorIR program.
+
+**Problems with the current scheduling system.** Currently we have 3 sets of scheduling APIs:
+* **Manual schedule**: Developers optimize their programs by manually invoking schedule primitives,
+  i.e. explore points in the design space with humans in the loop. This can be a tedious and
+  error-prone approach, hence the creation of AutoTVM and AutoScheduler (Ansor).
+* **AutoTVM**: The automation system requires users to define "schedule templates" as the design
+  space for each operator. Therefore, it is inextensible to hundreds of operators and variety
+  hardware platforms.
+* **AutoScheduler (Ansor)**: It automatically generates schedule templates as the design space,
+  according to a set of predefined "search rules". However, it is non-trivial to extend
+  AutoScheduler to new schedule primitives (tensorize, loop partition, software pipelining, etc).
+* The three systems above have isolated sets of APIs with several layers of their own abstraction,
+  which are not only hard to learn, but also engineering-intensive to customize.
+
+**Benefits of Meta Schedule.**  Meta schedule provides:
+* Succinct syntax, consistent APIs to TensorIR schedule with no other layer of abstraction.
+* Unified APIs for implementing manual schedules, AutoTVM-style schedules, and AutoScheduler-style
+  schedules.
+* Extensibility of all the schedule primitives, including tensorization and loop partitioning.
+  Almost no extra effort is needed to use a new primitive in auto-tuning.
+* The automation infrastructure is extensible on every of its components. Every component of the
+  system can be customized easily in pure python or C++ or both. For example, one can develop a new
+  design space generator in python, a new ProgramRunner in python, etc.
+
+
+## 3. Guide-level explanation
+
+In this section, we describe the syntax of meta schedule DSL, and how it could be used to describe
+and auto-generate the design space.
+
+### 3.1. Manual Schedule
+
+Meta schedule APIs are almost the same as TE or TensorIR scheduling. Here is an example of a manual
+schedule for matrix multiplication:
+
+```python
+# Designate a set of tile sizes
+i_tiles = [16, 8, 8, 8]
+j_tiles = [16, 8, 8, 8]
+k_tiles = [256, 8]
+
+# Tile the loops according to the tile sizes
+i_0, i_1, i_2, i_3 = sch.split(loop=i, factors=i_tiles)
+j_0, j_1, j_2, j_3 = sch.split(loop=j, factors=j_tiles)
+k_0, k_1           = sch.split(loop=k, factors=k_tiles)
+
+# Organize the loops into "SSRSRS" 6-level tiles
+sch.reorder(
+    i_0, j_0, # S: the 1st spatial tile
+    i_1, j_1, # S: the 2nd spatial tile
+    k_0,      # R: the 1st reduction tile
+    i_2, j_2, # S: the 3rd spatial tile
+    k_1,      # R: the 2nd reduction tile
+    i_3, j_3, # S: the 4th spatial tile
+)
+```
+
+In this example, the developers may tweak the tile sizes and measure the performance of the
+generated kernels to explore the opportunities of potential optimization.
+
+Generally speaking, while writing a schedule, there are often some parameters that are hard to
+determine ahead of time, for example, tile sizes, unroll steps, or which tensor intrinsics to use.
+Developers may manually enumerate possible combinations of these unknown factors, and then pick the
+best schedule according to measurement results on their device.
+
+### 3.2. Composite Schedule Rules
+
+As introduced in the previous section, in TensorIR, each schedule primitive handles only a very
+basic transformation of the IR. For example, `split` only splits a loop into two new loops. In the

Review comment:
       in the forum example you've given, the two-loop split is just applied three times to separate axes. i'm not sure the problem is well enough illustrated by that example.

##########
File path: rfcs/0001-meta-schedule-autotensorir.md
##########
@@ -0,0 +1,444 @@
+<!--- Licensed to the Apache Software Foundation (ASF) under one -->
+<!--- or more contributor license agreements.  See the NOTICE file -->
+<!--- distributed with this work for additional information -->
+<!--- regarding copyright ownership.  The ASF licenses this file -->
+<!--- to you under the Apache License, Version 2.0 (the -->
+<!--- "License"); you may not use this file except in compliance -->
+<!--- with the License.  You may obtain a copy of the License at -->
+
+<!---   http://www.apache.org/licenses/LICENSE-2.0 -->
+
+<!--- Unless required by applicable law or agreed to in writing, -->
+<!--- software distributed under the License is distributed on an -->
+<!--- "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY -->
+<!--- KIND, either express or implied.  See the License for the -->
+<!--- specific language governing permissions and limitations -->
+<!--- under the License. -->
+
+* Feature Name: Meta Schedule (Formerly AutoTIR)
+* Start Date: 2021-05-28
+* RFC PR: TBD (apache/tvm-rfcs#0000)
+* GitHub Issue: TBD (apache/tvm-rfcs#0000)
+
+## 1. Summary
+
+This proposal introduces Meta Schedule: a probabilistic scheduling DSL on TIR that unifies the
+approaches of AutoTVM and Auto Scheduler (Ansor). Meta schedule provides a pragmatic way to define
+the space of automatic tuning, extensibility in terms of all possible TIR schedule primitives like
+tensorization and loop partitioning, and customizability on every layer of the automation system.
+
+Meta Schedule is our 3rd generation automatic scheduling system.
+
+## 2. Motivation
+
+**Scheduling and design space.** In TVM TensorIR, optimization of a TensorIR program is done via a
+sequence of transformations. For example, we reorder loops for better locality and we tensorize for
+specific hardware intrinsics. The process of invoking such a set of pre-defined transformations is
+called "**scheduling**", and each transformation is called a "**schedule primitive**". These
+primitives form a domain-specific language (DSL) describing the transformation of TensorIR programs.
+**Design space** is the set of all possible schedulings with respect to a TensorIR program.
+
+**Problems with the current scheduling system.** Currently we have 3 sets of scheduling APIs:
+* **Manual schedule**: Developers optimize their programs by manually invoking schedule primitives,
+  i.e. explore points in the design space with humans in the loop. This can be a tedious and
+  error-prone approach, hence the creation of AutoTVM and AutoScheduler (Ansor).
+* **AutoTVM**: The automation system requires users to define "schedule templates" as the design
+  space for each operator. Therefore, it is inextensible to hundreds of operators and variety
+  hardware platforms.
+* **AutoScheduler (Ansor)**: It automatically generates schedule templates as the design space,
+  according to a set of predefined "search rules". However, it is non-trivial to extend
+  AutoScheduler to new schedule primitives (tensorize, loop partition, software pipelining, etc).
+* The three systems above have isolated sets of APIs with several layers of their own abstraction,
+  which are not only hard to learn, but also engineering-intensive to customize.
+
+**Benefits of Meta Schedule.**  Meta schedule provides:
+* Succinct syntax, consistent APIs to TensorIR schedule with no other layer of abstraction.
+* Unified APIs for implementing manual schedules, AutoTVM-style schedules, and AutoScheduler-style
+  schedules.
+* Extensibility of all the schedule primitives, including tensorization and loop partitioning.
+  Almost no extra effort is needed to use a new primitive in auto-tuning.
+* The automation infrastructure is extensible on every of its components. Every component of the
+  system can be customized easily in pure python or C++ or both. For example, one can develop a new
+  design space generator in python, a new ProgramRunner in python, etc.
+
+
+## 3. Guide-level explanation
+
+In this section, we describe the syntax of meta schedule DSL, and how it could be used to describe
+and auto-generate the design space.
+
+### 3.1. Manual Schedule
+
+Meta schedule APIs are almost the same as TE or TensorIR scheduling. Here is an example of a manual
+schedule for matrix multiplication:
+
+```python
+# Designate a set of tile sizes
+i_tiles = [16, 8, 8, 8]
+j_tiles = [16, 8, 8, 8]
+k_tiles = [256, 8]
+
+# Tile the loops according to the tile sizes
+i_0, i_1, i_2, i_3 = sch.split(loop=i, factors=i_tiles)
+j_0, j_1, j_2, j_3 = sch.split(loop=j, factors=j_tiles)
+k_0, k_1           = sch.split(loop=k, factors=k_tiles)
+
+# Organize the loops into "SSRSRS" 6-level tiles
+sch.reorder(
+    i_0, j_0, # S: the 1st spatial tile
+    i_1, j_1, # S: the 2nd spatial tile
+    k_0,      # R: the 1st reduction tile
+    i_2, j_2, # S: the 3rd spatial tile
+    k_1,      # R: the 2nd reduction tile
+    i_3, j_3, # S: the 4th spatial tile
+)
+```
+
+In this example, the developers may tweak the tile sizes and measure the performance of the
+generated kernels to explore the opportunities of potential optimization.
+
+Generally speaking, while writing a schedule, there are often some parameters that are hard to
+determine ahead of time, for example, tile sizes, unroll steps, or which tensor intrinsics to use.
+Developers may manually enumerate possible combinations of these unknown factors, and then pick the
+best schedule according to measurement results on their device.
+
+### 3.2. Composite Schedule Rules
+
+As introduced in the previous section, in TensorIR, each schedule primitive handles only a very
+basic transformation of the IR. For example, `split` only splits a loop into two new loops. In the
+real world, the over-fine granularity of those primitives usually leads to repetitive and verbose
+scheduling code, as
+[mentioned](https://discuss.tvm.apache.org/t/rfc-tensorir-a-schedulable-ir-for-tvm/7872/43?u=junrushao1994)
+by developers in our community.
+
+To make it more convenient and modular, we allow users to register "composite schedules" that apply

Review comment:
       explain how this is different from all of the other examples of scheduling on docs.tvm.ai. i think the missing piece here is that the logic would reside in a TOPI schedule (but please clarify?). it would be great to also explain TOPI schedules briefly for the uninitiated. finally, the "for instance" doesn't really give me a general picture--could you describe here or give an example of the input and output of a composite schedule rule?

##########
File path: rfcs/0001-meta-schedule-autotensorir.md
##########
@@ -0,0 +1,444 @@
+<!--- Licensed to the Apache Software Foundation (ASF) under one -->
+<!--- or more contributor license agreements.  See the NOTICE file -->
+<!--- distributed with this work for additional information -->
+<!--- regarding copyright ownership.  The ASF licenses this file -->
+<!--- to you under the Apache License, Version 2.0 (the -->
+<!--- "License"); you may not use this file except in compliance -->
+<!--- with the License.  You may obtain a copy of the License at -->
+
+<!---   http://www.apache.org/licenses/LICENSE-2.0 -->
+
+<!--- Unless required by applicable law or agreed to in writing, -->
+<!--- software distributed under the License is distributed on an -->
+<!--- "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY -->
+<!--- KIND, either express or implied.  See the License for the -->
+<!--- specific language governing permissions and limitations -->
+<!--- under the License. -->
+
+* Feature Name: Meta Schedule (Formerly AutoTIR)
+* Start Date: 2021-05-28
+* RFC PR: TBD (apache/tvm-rfcs#0000)
+* GitHub Issue: TBD (apache/tvm-rfcs#0000)
+
+## 1. Summary
+
+This proposal introduces Meta Schedule: a probabilistic scheduling DSL on TIR that unifies the
+approaches of AutoTVM and Auto Scheduler (Ansor). Meta schedule provides a pragmatic way to define
+the space of automatic tuning, extensibility in terms of all possible TIR schedule primitives like
+tensorization and loop partitioning, and customizability on every layer of the automation system.
+
+Meta Schedule is our 3rd generation automatic scheduling system.
+
+## 2. Motivation
+
+**Scheduling and design space.** In TVM TensorIR, optimization of a TensorIR program is done via a
+sequence of transformations. For example, we reorder loops for better locality and we tensorize for
+specific hardware intrinsics. The process of invoking such a set of pre-defined transformations is
+called "**scheduling**", and each transformation is called a "**schedule primitive**". These
+primitives form a domain-specific language (DSL) describing the transformation of TensorIR programs.
+**Design space** is the set of all possible schedulings with respect to a TensorIR program.
+
+**Problems with the current scheduling system.** Currently we have 3 sets of scheduling APIs:
+* **Manual schedule**: Developers optimize their programs by manually invoking schedule primitives,
+  i.e. explore points in the design space with humans in the loop. This can be a tedious and
+  error-prone approach, hence the creation of AutoTVM and AutoScheduler (Ansor).
+* **AutoTVM**: The automation system requires users to define "schedule templates" as the design
+  space for each operator. Therefore, it is inextensible to hundreds of operators and variety
+  hardware platforms.
+* **AutoScheduler (Ansor)**: It automatically generates schedule templates as the design space,
+  according to a set of predefined "search rules". However, it is non-trivial to extend
+  AutoScheduler to new schedule primitives (tensorize, loop partition, software pipelining, etc).
+* The three systems above have isolated sets of APIs with several layers of their own abstraction,
+  which are not only hard to learn, but also engineering-intensive to customize.
+
+**Benefits of Meta Schedule.**  Meta schedule provides:
+* Succinct syntax, consistent APIs to TensorIR schedule with no other layer of abstraction.
+* Unified APIs for implementing manual schedules, AutoTVM-style schedules, and AutoScheduler-style
+  schedules.
+* Extensibility of all the schedule primitives, including tensorization and loop partitioning.
+  Almost no extra effort is needed to use a new primitive in auto-tuning.
+* The automation infrastructure is extensible on every of its components. Every component of the
+  system can be customized easily in pure python or C++ or both. For example, one can develop a new
+  design space generator in python, a new ProgramRunner in python, etc.
+
+
+## 3. Guide-level explanation
+
+In this section, we describe the syntax of meta schedule DSL, and how it could be used to describe
+and auto-generate the design space.
+
+### 3.1. Manual Schedule
+
+Meta schedule APIs are almost the same as TE or TensorIR scheduling. Here is an example of a manual
+schedule for matrix multiplication:
+
+```python
+# Designate a set of tile sizes
+i_tiles = [16, 8, 8, 8]
+j_tiles = [16, 8, 8, 8]
+k_tiles = [256, 8]
+
+# Tile the loops according to the tile sizes
+i_0, i_1, i_2, i_3 = sch.split(loop=i, factors=i_tiles)
+j_0, j_1, j_2, j_3 = sch.split(loop=j, factors=j_tiles)
+k_0, k_1           = sch.split(loop=k, factors=k_tiles)
+
+# Organize the loops into "SSRSRS" 6-level tiles
+sch.reorder(
+    i_0, j_0, # S: the 1st spatial tile
+    i_1, j_1, # S: the 2nd spatial tile
+    k_0,      # R: the 1st reduction tile
+    i_2, j_2, # S: the 3rd spatial tile
+    k_1,      # R: the 2nd reduction tile
+    i_3, j_3, # S: the 4th spatial tile
+)
+```
+
+In this example, the developers may tweak the tile sizes and measure the performance of the
+generated kernels to explore the opportunities of potential optimization.
+
+Generally speaking, while writing a schedule, there are often some parameters that are hard to
+determine ahead of time, for example, tile sizes, unroll steps, or which tensor intrinsics to use.
+Developers may manually enumerate possible combinations of these unknown factors, and then pick the
+best schedule according to measurement results on their device.
+
+### 3.2. Composite Schedule Rules
+
+As introduced in the previous section, in TensorIR, each schedule primitive handles only a very
+basic transformation of the IR. For example, `split` only splits a loop into two new loops. In the
+real world, the over-fine granularity of those primitives usually leads to repetitive and verbose
+scheduling code, as
+[mentioned](https://discuss.tvm.apache.org/t/rfc-tensorir-a-schedulable-ir-for-tvm/7872/43?u=junrushao1994)
+by developers in our community.
+
+To make it more convenient and modular, we allow users to register "composite schedules" that apply
+a sequence of schedule primitives according to certain analysis of the IR. For instance, a composite
+schedule may inspect a TensorIR block and decide whether we should call `compute_inline` on it.
+
+### 3.3. AutoTVM-style Design Space Description
+
+Meta schedule extends the schedule DSL with sampling instructions. When included in a schedule,
+these instructions parametrize the schedule from a single deterministic point to a space supported
+by random variables (tile size, etc.), making it possible for developers to describe the design
+space with meta schedule APIs.
+
+We can extend the matmul example above to cover all possible tilings using these sampling
+instructions:
+
+```python
+# Sample tile sizes
+i_tiles = sch.sample_perfect_tile(i, n=4)
+j_tiles = sch.sample_perfect_tile(j, n=4)
+k_tiles = sch.sample_perfect_tile(k, n=2)
+# Tile the loops according to the random variables
+i_0, i_1, i_2, i_3 = sch.split(loop=i, factors=i_tiles)
+j_0, j_1, j_2, j_3 = sch.split(loop=j, factors=j_tiles)
+k_0, k_1           = sch.split(loop=k, factors=k_tiles)
+# Organize the loops into "SSRSRS" 6-level tiles
+sch.reorder(
+    i_0, j_0, # S: the 1st spatial tile
+    i_1, j_1, # S: the 2nd spatial tile
+    k_0,      # R: the 1st reduction tile
+    i_2, j_2, # S: the 3rd spatial tile
+    k_1,      # R: the 2nd reduction tile
+    i_3, j_3, # S: the 4th spatial tile
+)
+```
+
+### 3.4. AutoScheduler-style Design Space Generation
+
+AutoScheduler (Ansor) generates schedule templates by applying their SearchRules to each stage.
+SearchRule analyzes TE and eagerly trigger schedule primitives accordingly in its internally
+maintained mini IR.
+
+As introduced in Section 3.2, composite schedule rules are equivalent to AutoScheduler's SearchRule
+in TensorIR scheduling. To further generate a design space for scheduling, sampling instructions are
+used in composite schedule rules. Similarly, the sketch generation phase in AutoScheduler is
+equivalent to applying composite schedule rules to each block in TensorIR.
+
+Several built-in composite schedule rules are shipped with our system to align with Ansor's design
+space:
+
+* Multi-level tiling
+* Inline pure spatial blocks
+* Parallelize & vectorize & unroll
+* Auto tensorize
+
+Developers may implement their own rules in either Python or C++. They may specify which rules to
+use with the following syntax:
+
+```python
+from tvm import meta_schedule as ms
+
+design_space_generator = ms.PostOrderApply(rules=[
+    ms.MultiLevelTiling(...),
+    CustomRule(...),
+    ms.OtherBuiltinRules(...),
+])
+
+```
+
+### 3.5. Unifying manual schedule / AutoTVM / Ansor
+
+In this section, we show that the design space induced by TE manual schedule, AutoTVM and Ansor are
+all subsets of meta schedule, and meta schedule further allows mixing those three styles to search
+jointly.
+

Review comment:
       might be clearer to add bullet points for the below sections

##########
File path: rfcs/0001-meta-schedule-autotensorir.md
##########
@@ -0,0 +1,444 @@
+<!--- Licensed to the Apache Software Foundation (ASF) under one -->
+<!--- or more contributor license agreements.  See the NOTICE file -->
+<!--- distributed with this work for additional information -->
+<!--- regarding copyright ownership.  The ASF licenses this file -->
+<!--- to you under the Apache License, Version 2.0 (the -->
+<!--- "License"); you may not use this file except in compliance -->
+<!--- with the License.  You may obtain a copy of the License at -->
+
+<!---   http://www.apache.org/licenses/LICENSE-2.0 -->
+
+<!--- Unless required by applicable law or agreed to in writing, -->
+<!--- software distributed under the License is distributed on an -->
+<!--- "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY -->
+<!--- KIND, either express or implied.  See the License for the -->
+<!--- specific language governing permissions and limitations -->
+<!--- under the License. -->
+
+* Feature Name: Meta Schedule (Formerly AutoTIR)
+* Start Date: 2021-05-28
+* RFC PR: TBD (apache/tvm-rfcs#0000)
+* GitHub Issue: TBD (apache/tvm-rfcs#0000)
+
+## 1. Summary
+
+This proposal introduces Meta Schedule: a probabilistic scheduling DSL on TIR that unifies the
+approaches of AutoTVM and Auto Scheduler (Ansor). Meta schedule provides a pragmatic way to define
+the space of automatic tuning, extensibility in terms of all possible TIR schedule primitives like
+tensorization and loop partitioning, and customizability on every layer of the automation system.
+
+Meta Schedule is our 3rd generation automatic scheduling system.
+
+## 2. Motivation
+
+**Scheduling and design space.** In TVM TensorIR, optimization of a TensorIR program is done via a
+sequence of transformations. For example, we reorder loops for better locality and we tensorize for
+specific hardware intrinsics. The process of invoking such a set of pre-defined transformations is
+called "**scheduling**", and each transformation is called a "**schedule primitive**". These
+primitives form a domain-specific language (DSL) describing the transformation of TensorIR programs.
+**Design space** is the set of all possible schedulings with respect to a TensorIR program.
+
+**Problems with the current scheduling system.** Currently we have 3 sets of scheduling APIs:
+* **Manual schedule**: Developers optimize their programs by manually invoking schedule primitives,
+  i.e. explore points in the design space with humans in the loop. This can be a tedious and
+  error-prone approach, hence the creation of AutoTVM and AutoScheduler (Ansor).
+* **AutoTVM**: The automation system requires users to define "schedule templates" as the design
+  space for each operator. Therefore, it is inextensible to hundreds of operators and variety
+  hardware platforms.
+* **AutoScheduler (Ansor)**: It automatically generates schedule templates as the design space,
+  according to a set of predefined "search rules". However, it is non-trivial to extend
+  AutoScheduler to new schedule primitives (tensorize, loop partition, software pipelining, etc).
+* The three systems above have isolated sets of APIs with several layers of their own abstraction,
+  which are not only hard to learn, but also engineering-intensive to customize.
+
+**Benefits of Meta Schedule.**  Meta schedule provides:
+* Succinct syntax, consistent APIs to TensorIR schedule with no other layer of abstraction.
+* Unified APIs for implementing manual schedules, AutoTVM-style schedules, and AutoScheduler-style
+  schedules.
+* Extensibility of all the schedule primitives, including tensorization and loop partitioning.
+  Almost no extra effort is needed to use a new primitive in auto-tuning.
+* The automation infrastructure is extensible on every of its components. Every component of the
+  system can be customized easily in pure python or C++ or both. For example, one can develop a new
+  design space generator in python, a new ProgramRunner in python, etc.
+
+
+## 3. Guide-level explanation
+
+In this section, we describe the syntax of meta schedule DSL, and how it could be used to describe
+and auto-generate the design space.
+
+### 3.1. Manual Schedule
+
+Meta schedule APIs are almost the same as TE or TensorIR scheduling. Here is an example of a manual
+schedule for matrix multiplication:
+
+```python
+# Designate a set of tile sizes
+i_tiles = [16, 8, 8, 8]
+j_tiles = [16, 8, 8, 8]
+k_tiles = [256, 8]
+
+# Tile the loops according to the tile sizes
+i_0, i_1, i_2, i_3 = sch.split(loop=i, factors=i_tiles)
+j_0, j_1, j_2, j_3 = sch.split(loop=j, factors=j_tiles)
+k_0, k_1           = sch.split(loop=k, factors=k_tiles)
+
+# Organize the loops into "SSRSRS" 6-level tiles
+sch.reorder(
+    i_0, j_0, # S: the 1st spatial tile
+    i_1, j_1, # S: the 2nd spatial tile
+    k_0,      # R: the 1st reduction tile
+    i_2, j_2, # S: the 3rd spatial tile
+    k_1,      # R: the 2nd reduction tile
+    i_3, j_3, # S: the 4th spatial tile
+)
+```
+
+In this example, the developers may tweak the tile sizes and measure the performance of the
+generated kernels to explore the opportunities of potential optimization.
+
+Generally speaking, while writing a schedule, there are often some parameters that are hard to
+determine ahead of time, for example, tile sizes, unroll steps, or which tensor intrinsics to use.
+Developers may manually enumerate possible combinations of these unknown factors, and then pick the
+best schedule according to measurement results on their device.
+
+### 3.2. Composite Schedule Rules
+
+As introduced in the previous section, in TensorIR, each schedule primitive handles only a very
+basic transformation of the IR. For example, `split` only splits a loop into two new loops. In the
+real world, the over-fine granularity of those primitives usually leads to repetitive and verbose
+scheduling code, as
+[mentioned](https://discuss.tvm.apache.org/t/rfc-tensorir-a-schedulable-ir-for-tvm/7872/43?u=junrushao1994)
+by developers in our community.
+
+To make it more convenient and modular, we allow users to register "composite schedules" that apply
+a sequence of schedule primitives according to certain analysis of the IR. For instance, a composite
+schedule may inspect a TensorIR block and decide whether we should call `compute_inline` on it.
+
+### 3.3. AutoTVM-style Design Space Description
+
+Meta schedule extends the schedule DSL with sampling instructions. When included in a schedule,
+these instructions parametrize the schedule from a single deterministic point to a space supported
+by random variables (tile size, etc.), making it possible for developers to describe the design
+space with meta schedule APIs.
+
+We can extend the matmul example above to cover all possible tilings using these sampling
+instructions:
+
+```python
+# Sample tile sizes
+i_tiles = sch.sample_perfect_tile(i, n=4)
+j_tiles = sch.sample_perfect_tile(j, n=4)
+k_tiles = sch.sample_perfect_tile(k, n=2)
+# Tile the loops according to the random variables
+i_0, i_1, i_2, i_3 = sch.split(loop=i, factors=i_tiles)
+j_0, j_1, j_2, j_3 = sch.split(loop=j, factors=j_tiles)
+k_0, k_1           = sch.split(loop=k, factors=k_tiles)
+# Organize the loops into "SSRSRS" 6-level tiles
+sch.reorder(
+    i_0, j_0, # S: the 1st spatial tile
+    i_1, j_1, # S: the 2nd spatial tile
+    k_0,      # R: the 1st reduction tile
+    i_2, j_2, # S: the 3rd spatial tile
+    k_1,      # R: the 2nd reduction tile
+    i_3, j_3, # S: the 4th spatial tile
+)
+```
+
+### 3.4. AutoScheduler-style Design Space Generation
+
+AutoScheduler (Ansor) generates schedule templates by applying their SearchRules to each stage.

Review comment:
       ```suggestion
   AutoScheduler (Ansor) generates schedule templates by applying a set of **SearchRule** to each stage.
   ```

##########
File path: rfcs/0001-meta-schedule-autotensorir.md
##########
@@ -0,0 +1,444 @@
+<!--- Licensed to the Apache Software Foundation (ASF) under one -->
+<!--- or more contributor license agreements.  See the NOTICE file -->
+<!--- distributed with this work for additional information -->
+<!--- regarding copyright ownership.  The ASF licenses this file -->
+<!--- to you under the Apache License, Version 2.0 (the -->
+<!--- "License"); you may not use this file except in compliance -->
+<!--- with the License.  You may obtain a copy of the License at -->
+
+<!---   http://www.apache.org/licenses/LICENSE-2.0 -->
+
+<!--- Unless required by applicable law or agreed to in writing, -->
+<!--- software distributed under the License is distributed on an -->
+<!--- "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY -->
+<!--- KIND, either express or implied.  See the License for the -->
+<!--- specific language governing permissions and limitations -->
+<!--- under the License. -->
+
+* Feature Name: Meta Schedule (Formerly AutoTIR)
+* Start Date: 2021-05-28
+* RFC PR: TBD (apache/tvm-rfcs#0000)
+* GitHub Issue: TBD (apache/tvm-rfcs#0000)
+
+## 1. Summary
+
+This proposal introduces Meta Schedule: a probabilistic scheduling DSL on TIR that unifies the
+approaches of AutoTVM and Auto Scheduler (Ansor). Meta schedule provides a pragmatic way to define
+the space of automatic tuning, extensibility in terms of all possible TIR schedule primitives like
+tensorization and loop partitioning, and customizability on every layer of the automation system.
+
+Meta Schedule is our 3rd generation automatic scheduling system.
+
+## 2. Motivation
+
+**Scheduling and design space.** In TVM TensorIR, optimization of a TensorIR program is done via a
+sequence of transformations. For example, we reorder loops for better locality and we tensorize for
+specific hardware intrinsics. The process of invoking such a set of pre-defined transformations is
+called "**scheduling**", and each transformation is called a "**schedule primitive**". These
+primitives form a domain-specific language (DSL) describing the transformation of TensorIR programs.
+**Design space** is the set of all possible schedulings with respect to a TensorIR program.
+
+**Problems with the current scheduling system.** Currently we have 3 sets of scheduling APIs:
+* **Manual schedule**: Developers optimize their programs by manually invoking schedule primitives,
+  i.e. explore points in the design space with humans in the loop. This can be a tedious and
+  error-prone approach, hence the creation of AutoTVM and AutoScheduler (Ansor).
+* **AutoTVM**: The automation system requires users to define "schedule templates" as the design
+  space for each operator. Therefore, it is inextensible to hundreds of operators and variety

Review comment:
       ```suggestion
   * **AutoTVM**: The automation system requires users to define the design space through
     per-operator "schedule templates." Therefore, programmer time is a bottleneck in scaling
     to hundreds of operators across many hardware platforms.
   ```

##########
File path: rfcs/0001-meta-schedule-autotensorir.md
##########
@@ -0,0 +1,444 @@
+<!--- Licensed to the Apache Software Foundation (ASF) under one -->
+<!--- or more contributor license agreements.  See the NOTICE file -->
+<!--- distributed with this work for additional information -->
+<!--- regarding copyright ownership.  The ASF licenses this file -->
+<!--- to you under the Apache License, Version 2.0 (the -->
+<!--- "License"); you may not use this file except in compliance -->
+<!--- with the License.  You may obtain a copy of the License at -->
+
+<!---   http://www.apache.org/licenses/LICENSE-2.0 -->
+
+<!--- Unless required by applicable law or agreed to in writing, -->
+<!--- software distributed under the License is distributed on an -->
+<!--- "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY -->
+<!--- KIND, either express or implied.  See the License for the -->
+<!--- specific language governing permissions and limitations -->
+<!--- under the License. -->
+
+* Feature Name: Meta Schedule (Formerly AutoTIR)
+* Start Date: 2021-05-28
+* RFC PR: TBD (apache/tvm-rfcs#0000)
+* GitHub Issue: TBD (apache/tvm-rfcs#0000)
+
+## 1. Summary
+
+This proposal introduces Meta Schedule: a probabilistic scheduling DSL on TIR that unifies the
+approaches of AutoTVM and Auto Scheduler (Ansor). Meta schedule provides a pragmatic way to define
+the space of automatic tuning, extensibility in terms of all possible TIR schedule primitives like
+tensorization and loop partitioning, and customizability on every layer of the automation system.
+
+Meta Schedule is our 3rd generation automatic scheduling system.
+
+## 2. Motivation
+
+**Scheduling and design space.** In TVM TensorIR, optimization of a TensorIR program is done via a
+sequence of transformations. For example, we reorder loops for better locality and we tensorize for
+specific hardware intrinsics. The process of invoking such a set of pre-defined transformations is
+called "**scheduling**", and each transformation is called a "**schedule primitive**". These
+primitives form a domain-specific language (DSL) describing the transformation of TensorIR programs.
+**Design space** is the set of all possible schedulings with respect to a TensorIR program.
+
+**Problems with the current scheduling system.** Currently we have 3 sets of scheduling APIs:
+* **Manual schedule**: Developers optimize their programs by manually invoking schedule primitives,
+  i.e. explore points in the design space with humans in the loop. This can be a tedious and
+  error-prone approach, hence the creation of AutoTVM and AutoScheduler (Ansor).
+* **AutoTVM**: The automation system requires users to define "schedule templates" as the design
+  space for each operator. Therefore, it is inextensible to hundreds of operators and variety
+  hardware platforms.
+* **AutoScheduler (Ansor)**: It automatically generates schedule templates as the design space,
+  according to a set of predefined "search rules". However, it is non-trivial to extend
+  AutoScheduler to new schedule primitives (tensorize, loop partition, software pipelining, etc).
+* The three systems above have isolated sets of APIs with several layers of their own abstraction,
+  which are not only hard to learn, but also engineering-intensive to customize.
+
+**Benefits of Meta Schedule.**  Meta schedule provides:
+* Succinct syntax, consistent APIs to TensorIR schedule with no other layer of abstraction.
+* Unified APIs for implementing manual schedules, AutoTVM-style schedules, and AutoScheduler-style
+  schedules.
+* Extensibility of all the schedule primitives, including tensorization and loop partitioning.
+  Almost no extra effort is needed to use a new primitive in auto-tuning.
+* The automation infrastructure is extensible on every of its components. Every component of the
+  system can be customized easily in pure python or C++ or both. For example, one can develop a new
+  design space generator in python, a new ProgramRunner in python, etc.
+
+
+## 3. Guide-level explanation
+
+In this section, we describe the syntax of meta schedule DSL, and how it could be used to describe
+and auto-generate the design space.

Review comment:
       it would help to explain here that there are several different ways to express operator scheduling in the Meta Schedule DSL, and that this section will enumerate those ways with one subsection for each.

##########
File path: rfcs/0001-meta-schedule-autotensorir.md
##########
@@ -0,0 +1,444 @@
+<!--- Licensed to the Apache Software Foundation (ASF) under one -->
+<!--- or more contributor license agreements.  See the NOTICE file -->
+<!--- distributed with this work for additional information -->
+<!--- regarding copyright ownership.  The ASF licenses this file -->
+<!--- to you under the Apache License, Version 2.0 (the -->
+<!--- "License"); you may not use this file except in compliance -->
+<!--- with the License.  You may obtain a copy of the License at -->
+
+<!---   http://www.apache.org/licenses/LICENSE-2.0 -->
+
+<!--- Unless required by applicable law or agreed to in writing, -->
+<!--- software distributed under the License is distributed on an -->
+<!--- "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY -->
+<!--- KIND, either express or implied.  See the License for the -->
+<!--- specific language governing permissions and limitations -->
+<!--- under the License. -->
+
+* Feature Name: Meta Schedule (Formerly AutoTIR)
+* Start Date: 2021-05-28
+* RFC PR: TBD (apache/tvm-rfcs#0000)
+* GitHub Issue: TBD (apache/tvm-rfcs#0000)
+
+## 1. Summary
+
+This proposal introduces Meta Schedule: a probabilistic scheduling DSL on TIR that unifies the
+approaches of AutoTVM and Auto Scheduler (Ansor). Meta schedule provides a pragmatic way to define
+the space of automatic tuning, extensibility in terms of all possible TIR schedule primitives like
+tensorization and loop partitioning, and customizability on every layer of the automation system.
+
+Meta Schedule is our 3rd generation automatic scheduling system.
+
+## 2. Motivation
+
+**Scheduling and design space.** In TVM TensorIR, optimization of a TensorIR program is done via a
+sequence of transformations. For example, we reorder loops for better locality and we tensorize for
+specific hardware intrinsics. The process of invoking such a set of pre-defined transformations is
+called "**scheduling**", and each transformation is called a "**schedule primitive**". These
+primitives form a domain-specific language (DSL) describing the transformation of TensorIR programs.
+**Design space** is the set of all possible schedulings with respect to a TensorIR program.
+
+**Problems with the current scheduling system.** Currently we have 3 sets of scheduling APIs:
+* **Manual schedule**: Developers optimize their programs by manually invoking schedule primitives,
+  i.e. explore points in the design space with humans in the loop. This can be a tedious and
+  error-prone approach, hence the creation of AutoTVM and AutoScheduler (Ansor).
+* **AutoTVM**: The automation system requires users to define "schedule templates" as the design
+  space for each operator. Therefore, it is inextensible to hundreds of operators and variety
+  hardware platforms.
+* **AutoScheduler (Ansor)**: It automatically generates schedule templates as the design space,
+  according to a set of predefined "search rules". However, it is non-trivial to extend
+  AutoScheduler to new schedule primitives (tensorize, loop partition, software pipelining, etc).
+* The three systems above have isolated sets of APIs with several layers of their own abstraction,
+  which are not only hard to learn, but also engineering-intensive to customize.
+
+**Benefits of Meta Schedule.**  Meta schedule provides:
+* Succinct syntax, consistent APIs to TensorIR schedule with no other layer of abstraction.
+* Unified APIs for implementing manual schedules, AutoTVM-style schedules, and AutoScheduler-style
+  schedules.
+* Extensibility of all the schedule primitives, including tensorization and loop partitioning.
+  Almost no extra effort is needed to use a new primitive in auto-tuning.
+* The automation infrastructure is extensible on every of its components. Every component of the
+  system can be customized easily in pure python or C++ or both. For example, one can develop a new
+  design space generator in python, a new ProgramRunner in python, etc.
+
+
+## 3. Guide-level explanation
+
+In this section, we describe the syntax of meta schedule DSL, and how it could be used to describe
+and auto-generate the design space.
+
+### 3.1. Manual Schedule
+
+Meta schedule APIs are almost the same as TE or TensorIR scheduling. Here is an example of a manual
+schedule for matrix multiplication:
+
+```python
+# Designate a set of tile sizes
+i_tiles = [16, 8, 8, 8]
+j_tiles = [16, 8, 8, 8]
+k_tiles = [256, 8]
+
+# Tile the loops according to the tile sizes
+i_0, i_1, i_2, i_3 = sch.split(loop=i, factors=i_tiles)
+j_0, j_1, j_2, j_3 = sch.split(loop=j, factors=j_tiles)
+k_0, k_1           = sch.split(loop=k, factors=k_tiles)
+
+# Organize the loops into "SSRSRS" 6-level tiles
+sch.reorder(
+    i_0, j_0, # S: the 1st spatial tile
+    i_1, j_1, # S: the 2nd spatial tile
+    k_0,      # R: the 1st reduction tile
+    i_2, j_2, # S: the 3rd spatial tile
+    k_1,      # R: the 2nd reduction tile
+    i_3, j_3, # S: the 4th spatial tile
+)
+```
+
+In this example, the developers may tweak the tile sizes and measure the performance of the
+generated kernels to explore the opportunities of potential optimization.
+
+Generally speaking, while writing a schedule, there are often some parameters that are hard to
+determine ahead of time, for example, tile sizes, unroll steps, or which tensor intrinsics to use.
+Developers may manually enumerate possible combinations of these unknown factors, and then pick the
+best schedule according to measurement results on their device.
+
+### 3.2. Composite Schedule Rules
+
+As introduced in the previous section, in TensorIR, each schedule primitive handles only a very
+basic transformation of the IR. For example, `split` only splits a loop into two new loops. In the
+real world, the over-fine granularity of those primitives usually leads to repetitive and verbose
+scheduling code, as
+[mentioned](https://discuss.tvm.apache.org/t/rfc-tensorir-a-schedulable-ir-for-tvm/7872/43?u=junrushao1994)
+by developers in our community.
+
+To make it more convenient and modular, we allow users to register "composite schedules" that apply
+a sequence of schedule primitives according to certain analysis of the IR. For instance, a composite
+schedule may inspect a TensorIR block and decide whether we should call `compute_inline` on it.
+
+### 3.3. AutoTVM-style Design Space Description
+
+Meta schedule extends the schedule DSL with sampling instructions. When included in a schedule,
+these instructions parametrize the schedule from a single deterministic point to a space supported

Review comment:
       sp: parameterize

##########
File path: rfcs/0001-meta-schedule-autotensorir.md
##########
@@ -0,0 +1,444 @@
+<!--- Licensed to the Apache Software Foundation (ASF) under one -->
+<!--- or more contributor license agreements.  See the NOTICE file -->
+<!--- distributed with this work for additional information -->
+<!--- regarding copyright ownership.  The ASF licenses this file -->
+<!--- to you under the Apache License, Version 2.0 (the -->
+<!--- "License"); you may not use this file except in compliance -->
+<!--- with the License.  You may obtain a copy of the License at -->
+
+<!---   http://www.apache.org/licenses/LICENSE-2.0 -->
+
+<!--- Unless required by applicable law or agreed to in writing, -->
+<!--- software distributed under the License is distributed on an -->
+<!--- "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY -->
+<!--- KIND, either express or implied.  See the License for the -->
+<!--- specific language governing permissions and limitations -->
+<!--- under the License. -->
+
+* Feature Name: Meta Schedule (Formerly AutoTIR)
+* Start Date: 2021-05-28
+* RFC PR: TBD (apache/tvm-rfcs#0000)
+* GitHub Issue: TBD (apache/tvm-rfcs#0000)
+
+## 1. Summary
+
+This proposal introduces Meta Schedule: a probabilistic scheduling DSL on TIR that unifies the
+approaches of AutoTVM and Auto Scheduler (Ansor). Meta schedule provides a pragmatic way to define
+the space of automatic tuning, extensibility in terms of all possible TIR schedule primitives like
+tensorization and loop partitioning, and customizability on every layer of the automation system.
+
+Meta Schedule is our 3rd generation automatic scheduling system.
+
+## 2. Motivation
+
+**Scheduling and design space.** In TVM TensorIR, optimization of a TensorIR program is done via a
+sequence of transformations. For example, we reorder loops for better locality and we tensorize for
+specific hardware intrinsics. The process of invoking such a set of pre-defined transformations is
+called "**scheduling**", and each transformation is called a "**schedule primitive**". These
+primitives form a domain-specific language (DSL) describing the transformation of TensorIR programs.
+**Design space** is the set of all possible schedulings with respect to a TensorIR program.
+
+**Problems with the current scheduling system.** Currently we have 3 sets of scheduling APIs:
+* **Manual schedule**: Developers optimize their programs by manually invoking schedule primitives,
+  i.e. explore points in the design space with humans in the loop. This can be a tedious and
+  error-prone approach, hence the creation of AutoTVM and AutoScheduler (Ansor).
+* **AutoTVM**: The automation system requires users to define "schedule templates" as the design
+  space for each operator. Therefore, it is inextensible to hundreds of operators and variety
+  hardware platforms.
+* **AutoScheduler (Ansor)**: It automatically generates schedule templates as the design space,
+  according to a set of predefined "search rules". However, it is non-trivial to extend
+  AutoScheduler to new schedule primitives (tensorize, loop partition, software pipelining, etc).
+* The three systems above have isolated sets of APIs with several layers of their own abstraction,
+  which are not only hard to learn, but also engineering-intensive to customize.
+
+**Benefits of Meta Schedule.**  Meta schedule provides:

Review comment:
       it would be helpful to sketch out a general vision of the approach or suggest ways to work around these limitations before enumerating the benefits. as i'm reading this right now, i'm just being made to take things at face value rather than first forming an opinion on what's being proposed and then reading your argument.
   
   essentially i'm suggesting that you could either provide examples in the above "problems" section and then note how they _could_ be improved, rather than discussing the improvements as if i've already read them

##########
File path: rfcs/0001-meta-schedule-autotensorir.md
##########
@@ -0,0 +1,444 @@
+<!--- Licensed to the Apache Software Foundation (ASF) under one -->
+<!--- or more contributor license agreements.  See the NOTICE file -->
+<!--- distributed with this work for additional information -->
+<!--- regarding copyright ownership.  The ASF licenses this file -->
+<!--- to you under the Apache License, Version 2.0 (the -->
+<!--- "License"); you may not use this file except in compliance -->
+<!--- with the License.  You may obtain a copy of the License at -->
+
+<!---   http://www.apache.org/licenses/LICENSE-2.0 -->
+
+<!--- Unless required by applicable law or agreed to in writing, -->
+<!--- software distributed under the License is distributed on an -->
+<!--- "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY -->
+<!--- KIND, either express or implied.  See the License for the -->
+<!--- specific language governing permissions and limitations -->
+<!--- under the License. -->
+
+* Feature Name: Meta Schedule (Formerly AutoTIR)
+* Start Date: 2021-05-28
+* RFC PR: TBD (apache/tvm-rfcs#0000)
+* GitHub Issue: TBD (apache/tvm-rfcs#0000)
+
+## 1. Summary
+
+This proposal introduces Meta Schedule: a probabilistic scheduling DSL on TIR that unifies the
+approaches of AutoTVM and Auto Scheduler (Ansor). Meta schedule provides a pragmatic way to define
+the space of automatic tuning, extensibility in terms of all possible TIR schedule primitives like
+tensorization and loop partitioning, and customizability on every layer of the automation system.
+
+Meta Schedule is our 3rd generation automatic scheduling system.
+
+## 2. Motivation
+
+**Scheduling and design space.** In TVM TensorIR, optimization of a TensorIR program is done via a
+sequence of transformations. For example, we reorder loops for better locality and we tensorize for
+specific hardware intrinsics. The process of invoking such a set of pre-defined transformations is
+called "**scheduling**", and each transformation is called a "**schedule primitive**". These
+primitives form a domain-specific language (DSL) describing the transformation of TensorIR programs.
+**Design space** is the set of all possible schedulings with respect to a TensorIR program.
+
+**Problems with the current scheduling system.** Currently we have 3 sets of scheduling APIs:
+* **Manual schedule**: Developers optimize their programs by manually invoking schedule primitives,
+  i.e. explore points in the design space with humans in the loop. This can be a tedious and
+  error-prone approach, hence the creation of AutoTVM and AutoScheduler (Ansor).
+* **AutoTVM**: The automation system requires users to define "schedule templates" as the design
+  space for each operator. Therefore, it is inextensible to hundreds of operators and variety
+  hardware platforms.
+* **AutoScheduler (Ansor)**: It automatically generates schedule templates as the design space,
+  according to a set of predefined "search rules". However, it is non-trivial to extend
+  AutoScheduler to new schedule primitives (tensorize, loop partition, software pipelining, etc).
+* The three systems above have isolated sets of APIs with several layers of their own abstraction,
+  which are not only hard to learn, but also engineering-intensive to customize.
+
+**Benefits of Meta Schedule.**  Meta schedule provides:
+* Succinct syntax, consistent APIs to TensorIR schedule with no other layer of abstraction.
+* Unified APIs for implementing manual schedules, AutoTVM-style schedules, and AutoScheduler-style
+  schedules.
+* Extensibility of all the schedule primitives, including tensorization and loop partitioning.
+  Almost no extra effort is needed to use a new primitive in auto-tuning.
+* The automation infrastructure is extensible on every of its components. Every component of the
+  system can be customized easily in pure python or C++ or both. For example, one can develop a new
+  design space generator in python, a new ProgramRunner in python, etc.
+
+
+## 3. Guide-level explanation
+
+In this section, we describe the syntax of meta schedule DSL, and how it could be used to describe
+and auto-generate the design space.
+
+### 3.1. Manual Schedule
+
+Meta schedule APIs are almost the same as TE or TensorIR scheduling. Here is an example of a manual
+schedule for matrix multiplication:
+
+```python
+# Designate a set of tile sizes
+i_tiles = [16, 8, 8, 8]
+j_tiles = [16, 8, 8, 8]
+k_tiles = [256, 8]
+
+# Tile the loops according to the tile sizes
+i_0, i_1, i_2, i_3 = sch.split(loop=i, factors=i_tiles)
+j_0, j_1, j_2, j_3 = sch.split(loop=j, factors=j_tiles)
+k_0, k_1           = sch.split(loop=k, factors=k_tiles)
+
+# Organize the loops into "SSRSRS" 6-level tiles
+sch.reorder(
+    i_0, j_0, # S: the 1st spatial tile
+    i_1, j_1, # S: the 2nd spatial tile
+    k_0,      # R: the 1st reduction tile
+    i_2, j_2, # S: the 3rd spatial tile
+    k_1,      # R: the 2nd reduction tile
+    i_3, j_3, # S: the 4th spatial tile
+)
+```
+
+In this example, the developers may tweak the tile sizes and measure the performance of the
+generated kernels to explore the opportunities of potential optimization.
+
+Generally speaking, while writing a schedule, there are often some parameters that are hard to
+determine ahead of time, for example, tile sizes, unroll steps, or which tensor intrinsics to use.
+Developers may manually enumerate possible combinations of these unknown factors, and then pick the
+best schedule according to measurement results on their device.
+
+### 3.2. Composite Schedule Rules
+
+As introduced in the previous section, in TensorIR, each schedule primitive handles only a very
+basic transformation of the IR. For example, `split` only splits a loop into two new loops. In the
+real world, the over-fine granularity of those primitives usually leads to repetitive and verbose
+scheduling code, as
+[mentioned](https://discuss.tvm.apache.org/t/rfc-tensorir-a-schedulable-ir-for-tvm/7872/43?u=junrushao1994)
+by developers in our community.
+
+To make it more convenient and modular, we allow users to register "composite schedules" that apply
+a sequence of schedule primitives according to certain analysis of the IR. For instance, a composite
+schedule may inspect a TensorIR block and decide whether we should call `compute_inline` on it.
+
+### 3.3. AutoTVM-style Design Space Description
+
+Meta schedule extends the schedule DSL with sampling instructions. When included in a schedule,

Review comment:
       what's a "sampling instruction"? who does the sampling and when does it happen? does "composite scheduling" also provide a design space, or is it just a way to choose between multiple manual schedules?

##########
File path: rfcs/0001-meta-schedule-autotensorir.md
##########
@@ -0,0 +1,444 @@
+<!--- Licensed to the Apache Software Foundation (ASF) under one -->
+<!--- or more contributor license agreements.  See the NOTICE file -->
+<!--- distributed with this work for additional information -->
+<!--- regarding copyright ownership.  The ASF licenses this file -->
+<!--- to you under the Apache License, Version 2.0 (the -->
+<!--- "License"); you may not use this file except in compliance -->
+<!--- with the License.  You may obtain a copy of the License at -->
+
+<!---   http://www.apache.org/licenses/LICENSE-2.0 -->
+
+<!--- Unless required by applicable law or agreed to in writing, -->
+<!--- software distributed under the License is distributed on an -->
+<!--- "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY -->
+<!--- KIND, either express or implied.  See the License for the -->
+<!--- specific language governing permissions and limitations -->
+<!--- under the License. -->
+
+* Feature Name: Meta Schedule (Formerly AutoTIR)
+* Start Date: 2021-05-28
+* RFC PR: TBD (apache/tvm-rfcs#0000)
+* GitHub Issue: TBD (apache/tvm-rfcs#0000)
+
+## 1. Summary
+
+This proposal introduces Meta Schedule: a probabilistic scheduling DSL on TIR that unifies the
+approaches of AutoTVM and Auto Scheduler (Ansor). Meta schedule provides a pragmatic way to define
+the space of automatic tuning, extensibility in terms of all possible TIR schedule primitives like
+tensorization and loop partitioning, and customizability on every layer of the automation system.
+
+Meta Schedule is our 3rd generation automatic scheduling system.
+
+## 2. Motivation
+
+**Scheduling and design space.** In TVM TensorIR, optimization of a TensorIR program is done via a
+sequence of transformations. For example, we reorder loops for better locality and we tensorize for
+specific hardware intrinsics. The process of invoking such a set of pre-defined transformations is
+called "**scheduling**", and each transformation is called a "**schedule primitive**". These
+primitives form a domain-specific language (DSL) describing the transformation of TensorIR programs.
+**Design space** is the set of all possible schedulings with respect to a TensorIR program.
+
+**Problems with the current scheduling system.** Currently we have 3 sets of scheduling APIs:
+* **Manual schedule**: Developers optimize their programs by manually invoking schedule primitives,
+  i.e. explore points in the design space with humans in the loop. This can be a tedious and
+  error-prone approach, hence the creation of AutoTVM and AutoScheduler (Ansor).
+* **AutoTVM**: The automation system requires users to define "schedule templates" as the design
+  space for each operator. Therefore, it is inextensible to hundreds of operators and variety
+  hardware platforms.
+* **AutoScheduler (Ansor)**: It automatically generates schedule templates as the design space,
+  according to a set of predefined "search rules". However, it is non-trivial to extend
+  AutoScheduler to new schedule primitives (tensorize, loop partition, software pipelining, etc).
+* The three systems above have isolated sets of APIs with several layers of their own abstraction,
+  which are not only hard to learn, but also engineering-intensive to customize.
+
+**Benefits of Meta Schedule.**  Meta schedule provides:
+* Succinct syntax, consistent APIs to TensorIR schedule with no other layer of abstraction.
+* Unified APIs for implementing manual schedules, AutoTVM-style schedules, and AutoScheduler-style
+  schedules.
+* Extensibility of all the schedule primitives, including tensorization and loop partitioning.
+  Almost no extra effort is needed to use a new primitive in auto-tuning.
+* The automation infrastructure is extensible on every of its components. Every component of the
+  system can be customized easily in pure python or C++ or both. For example, one can develop a new
+  design space generator in python, a new ProgramRunner in python, etc.
+
+
+## 3. Guide-level explanation
+
+In this section, we describe the syntax of meta schedule DSL, and how it could be used to describe
+and auto-generate the design space.
+
+### 3.1. Manual Schedule
+
+Meta schedule APIs are almost the same as TE or TensorIR scheduling. Here is an example of a manual
+schedule for matrix multiplication:
+
+```python
+# Designate a set of tile sizes
+i_tiles = [16, 8, 8, 8]
+j_tiles = [16, 8, 8, 8]
+k_tiles = [256, 8]
+
+# Tile the loops according to the tile sizes
+i_0, i_1, i_2, i_3 = sch.split(loop=i, factors=i_tiles)
+j_0, j_1, j_2, j_3 = sch.split(loop=j, factors=j_tiles)
+k_0, k_1           = sch.split(loop=k, factors=k_tiles)
+
+# Organize the loops into "SSRSRS" 6-level tiles
+sch.reorder(
+    i_0, j_0, # S: the 1st spatial tile
+    i_1, j_1, # S: the 2nd spatial tile
+    k_0,      # R: the 1st reduction tile
+    i_2, j_2, # S: the 3rd spatial tile
+    k_1,      # R: the 2nd reduction tile
+    i_3, j_3, # S: the 4th spatial tile
+)
+```
+
+In this example, the developers may tweak the tile sizes and measure the performance of the
+generated kernels to explore the opportunities of potential optimization.
+
+Generally speaking, while writing a schedule, there are often some parameters that are hard to
+determine ahead of time, for example, tile sizes, unroll steps, or which tensor intrinsics to use.
+Developers may manually enumerate possible combinations of these unknown factors, and then pick the
+best schedule according to measurement results on their device.
+
+### 3.2. Composite Schedule Rules
+
+As introduced in the previous section, in TensorIR, each schedule primitive handles only a very
+basic transformation of the IR. For example, `split` only splits a loop into two new loops. In the
+real world, the over-fine granularity of those primitives usually leads to repetitive and verbose
+scheduling code, as
+[mentioned](https://discuss.tvm.apache.org/t/rfc-tensorir-a-schedulable-ir-for-tvm/7872/43?u=junrushao1994)
+by developers in our community.
+
+To make it more convenient and modular, we allow users to register "composite schedules" that apply
+a sequence of schedule primitives according to certain analysis of the IR. For instance, a composite
+schedule may inspect a TensorIR block and decide whether we should call `compute_inline` on it.
+
+### 3.3. AutoTVM-style Design Space Description
+
+Meta schedule extends the schedule DSL with sampling instructions. When included in a schedule,
+these instructions parametrize the schedule from a single deterministic point to a space supported
+by random variables (tile size, etc.), making it possible for developers to describe the design
+space with meta schedule APIs.
+
+We can extend the matmul example above to cover all possible tilings using these sampling
+instructions:
+
+```python
+# Sample tile sizes
+i_tiles = sch.sample_perfect_tile(i, n=4)
+j_tiles = sch.sample_perfect_tile(j, n=4)
+k_tiles = sch.sample_perfect_tile(k, n=2)
+# Tile the loops according to the random variables
+i_0, i_1, i_2, i_3 = sch.split(loop=i, factors=i_tiles)
+j_0, j_1, j_2, j_3 = sch.split(loop=j, factors=j_tiles)
+k_0, k_1           = sch.split(loop=k, factors=k_tiles)
+# Organize the loops into "SSRSRS" 6-level tiles
+sch.reorder(
+    i_0, j_0, # S: the 1st spatial tile
+    i_1, j_1, # S: the 2nd spatial tile
+    k_0,      # R: the 1st reduction tile
+    i_2, j_2, # S: the 3rd spatial tile
+    k_1,      # R: the 2nd reduction tile
+    i_3, j_3, # S: the 4th spatial tile
+)
+```
+
+### 3.4. AutoScheduler-style Design Space Generation
+
+AutoScheduler (Ansor) generates schedule templates by applying their SearchRules to each stage.
+SearchRule analyzes TE and eagerly trigger schedule primitives accordingly in its internally
+maintained mini IR.
+
+As introduced in Section 3.2, composite schedule rules are equivalent to AutoScheduler's SearchRule
+in TensorIR scheduling. To further generate a design space for scheduling, sampling instructions are
+used in composite schedule rules. Similarly, the sketch generation phase in AutoScheduler is
+equivalent to applying composite schedule rules to each block in TensorIR.
+
+Several built-in composite schedule rules are shipped with our system to align with Ansor's design
+space:
+
+* Multi-level tiling
+* Inline pure spatial blocks
+* Parallelize & vectorize & unroll
+* Auto tensorize
+
+Developers may implement their own rules in either Python or C++. They may specify which rules to
+use with the following syntax:
+
+```python
+from tvm import meta_schedule as ms
+
+design_space_generator = ms.PostOrderApply(rules=[
+    ms.MultiLevelTiling(...),
+    CustomRule(...),
+    ms.OtherBuiltinRules(...),
+])
+
+```
+
+### 3.5. Unifying manual schedule / AutoTVM / Ansor
+
+In this section, we show that the design space induced by TE manual schedule, AutoTVM and Ansor are
+all subsets of meta schedule, and meta schedule further allows mixing those three styles to search
+jointly.
+
+**Manual schedule**. The TE schedule is a special case of a meta schedule program, where there is no
+randomness introduced by sampling instructions. It is a single point in terms of design space.
+
+**AutoTVM (Template-based tuning)**. It is more natural representation of AutoTVM’s schedule
+templates (knobs) by writing one or more schedule functions in meta schedule with sampling
+instructions. The execution traces generated by the schedule functions are the design space to be
+explored.
+
+**AutoScheduler (Ansor, Template-free tuning)**. As mentioned in the previous section, application
+of composite schedule rules generates the design space, which is equivalent to Ansor’s sketch
+generation.
+
+**Mixing styles in design space definition**. By taking union of the spaces induced by the three
+special cases, our system allows developers to combine generic rules that Ansor provides and
+operator-specific scheduling.
+
+## 4. Reference-level explanation
+
+In this section, we introduce the underlying techniques for the automation system to extract and
+explore the design space. The figure below briefly illustrates the workflow of the system:
+
+![meta-schedule-workflow](../resources/meta-schedule-workflow.png)
+
+### 4.1. Execution trace as the design space
+
+**Trace**. To represent the design space defined by the meta schedule DSL, the underlying system
+records all the instructions users applied to the schedule class, including sampling and schedule
+primitives. We call this list of instructions a trace.
+
+For instance, executing the example above results in the following trace:
+
+```
+Instruction 0. Sample-Perfect-Tile
+Instruction 1. Sample-Perfect-Tile
+Instruction 2. Sample-Perfect-Tile
+Instruction 3. Split
+Instruction 4. Split
+Instruction 5. Split
+Instruction 6. Reorder
+```
+
+**Trace forms design space.** A trace may contain zero or more sampling instructions, which
+introduces the uncertainty in scheduling - one instance of sampling results in one point in the
+design space. Therefore, the trace itself forms a design space to explore, e.g. which set of tile
+sizes works best on a specific hardware.
+
+**Union of design space**. Our system works on a set of traces, representing the union of the design
+spaces represented by every single trace.
+
+**Fork a trace**. When two different decisions in the scheduling process are equally important to
+generate high-performance schedules, we allow forking the trace into two, and the design space is
+the union of the forked traces.
+
+The trace is not strictly user-facing, but can be accessed and printed with the following syntax:
+
+```python
+# requires to trace the execution
+sch = tir.Schedule(..., traced=True)
+# do a lot of scheduling
+...
+# print the trace
+print(sch.trace)
+```
+
+And below is an example of the printed trace, which honestly reflects the schedule as a snippet of
+python scheduling function:
+
+```python
+b0 = sch.get_block(name="matmul", func_name="main")
+l1, l2, l3 = sch.get_loops(block=b0)

Review comment:
       are there rules we use to name the variables such that they could be logically traced at debug time without too much thinking? e.g. is it possible to name them after the tiling instructions or axes involved?

##########
File path: rfcs/0001-meta-schedule-autotensorir.md
##########
@@ -0,0 +1,444 @@
+<!--- Licensed to the Apache Software Foundation (ASF) under one -->
+<!--- or more contributor license agreements.  See the NOTICE file -->
+<!--- distributed with this work for additional information -->
+<!--- regarding copyright ownership.  The ASF licenses this file -->
+<!--- to you under the Apache License, Version 2.0 (the -->
+<!--- "License"); you may not use this file except in compliance -->
+<!--- with the License.  You may obtain a copy of the License at -->
+
+<!---   http://www.apache.org/licenses/LICENSE-2.0 -->
+
+<!--- Unless required by applicable law or agreed to in writing, -->
+<!--- software distributed under the License is distributed on an -->
+<!--- "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY -->
+<!--- KIND, either express or implied.  See the License for the -->
+<!--- specific language governing permissions and limitations -->
+<!--- under the License. -->
+
+* Feature Name: Meta Schedule (Formerly AutoTIR)
+* Start Date: 2021-05-28
+* RFC PR: TBD (apache/tvm-rfcs#0000)
+* GitHub Issue: TBD (apache/tvm-rfcs#0000)
+
+## 1. Summary
+
+This proposal introduces Meta Schedule: a probabilistic scheduling DSL on TIR that unifies the
+approaches of AutoTVM and Auto Scheduler (Ansor). Meta schedule provides a pragmatic way to define
+the space of automatic tuning, extensibility in terms of all possible TIR schedule primitives like
+tensorization and loop partitioning, and customizability on every layer of the automation system.
+
+Meta Schedule is our 3rd generation automatic scheduling system.
+
+## 2. Motivation
+
+**Scheduling and design space.** In TVM TensorIR, optimization of a TensorIR program is done via a
+sequence of transformations. For example, we reorder loops for better locality and we tensorize for
+specific hardware intrinsics. The process of invoking such a set of pre-defined transformations is
+called "**scheduling**", and each transformation is called a "**schedule primitive**". These
+primitives form a domain-specific language (DSL) describing the transformation of TensorIR programs.
+**Design space** is the set of all possible schedulings with respect to a TensorIR program.
+
+**Problems with the current scheduling system.** Currently we have 3 sets of scheduling APIs:
+* **Manual schedule**: Developers optimize their programs by manually invoking schedule primitives,
+  i.e. explore points in the design space with humans in the loop. This can be a tedious and
+  error-prone approach, hence the creation of AutoTVM and AutoScheduler (Ansor).
+* **AutoTVM**: The automation system requires users to define "schedule templates" as the design
+  space for each operator. Therefore, it is inextensible to hundreds of operators and variety
+  hardware platforms.
+* **AutoScheduler (Ansor)**: It automatically generates schedule templates as the design space,
+  according to a set of predefined "search rules". However, it is non-trivial to extend
+  AutoScheduler to new schedule primitives (tensorize, loop partition, software pipelining, etc).
+* The three systems above have isolated sets of APIs with several layers of their own abstraction,
+  which are not only hard to learn, but also engineering-intensive to customize.
+
+**Benefits of Meta Schedule.**  Meta schedule provides:
+* Succinct syntax, consistent APIs to TensorIR schedule with no other layer of abstraction.
+* Unified APIs for implementing manual schedules, AutoTVM-style schedules, and AutoScheduler-style
+  schedules.
+* Extensibility of all the schedule primitives, including tensorization and loop partitioning.
+  Almost no extra effort is needed to use a new primitive in auto-tuning.
+* The automation infrastructure is extensible on every of its components. Every component of the
+  system can be customized easily in pure python or C++ or both. For example, one can develop a new
+  design space generator in python, a new ProgramRunner in python, etc.
+
+
+## 3. Guide-level explanation
+
+In this section, we describe the syntax of meta schedule DSL, and how it could be used to describe
+and auto-generate the design space.
+
+### 3.1. Manual Schedule
+
+Meta schedule APIs are almost the same as TE or TensorIR scheduling. Here is an example of a manual
+schedule for matrix multiplication:
+
+```python
+# Designate a set of tile sizes
+i_tiles = [16, 8, 8, 8]
+j_tiles = [16, 8, 8, 8]
+k_tiles = [256, 8]
+
+# Tile the loops according to the tile sizes
+i_0, i_1, i_2, i_3 = sch.split(loop=i, factors=i_tiles)
+j_0, j_1, j_2, j_3 = sch.split(loop=j, factors=j_tiles)
+k_0, k_1           = sch.split(loop=k, factors=k_tiles)
+
+# Organize the loops into "SSRSRS" 6-level tiles
+sch.reorder(
+    i_0, j_0, # S: the 1st spatial tile
+    i_1, j_1, # S: the 2nd spatial tile
+    k_0,      # R: the 1st reduction tile
+    i_2, j_2, # S: the 3rd spatial tile
+    k_1,      # R: the 2nd reduction tile
+    i_3, j_3, # S: the 4th spatial tile
+)
+```
+
+In this example, the developers may tweak the tile sizes and measure the performance of the
+generated kernels to explore the opportunities of potential optimization.
+
+Generally speaking, while writing a schedule, there are often some parameters that are hard to
+determine ahead of time, for example, tile sizes, unroll steps, or which tensor intrinsics to use.
+Developers may manually enumerate possible combinations of these unknown factors, and then pick the
+best schedule according to measurement results on their device.
+
+### 3.2. Composite Schedule Rules
+
+As introduced in the previous section, in TensorIR, each schedule primitive handles only a very
+basic transformation of the IR. For example, `split` only splits a loop into two new loops. In the
+real world, the over-fine granularity of those primitives usually leads to repetitive and verbose
+scheduling code, as
+[mentioned](https://discuss.tvm.apache.org/t/rfc-tensorir-a-schedulable-ir-for-tvm/7872/43?u=junrushao1994)
+by developers in our community.
+
+To make it more convenient and modular, we allow users to register "composite schedules" that apply
+a sequence of schedule primitives according to certain analysis of the IR. For instance, a composite
+schedule may inspect a TensorIR block and decide whether we should call `compute_inline` on it.
+
+### 3.3. AutoTVM-style Design Space Description
+
+Meta schedule extends the schedule DSL with sampling instructions. When included in a schedule,
+these instructions parametrize the schedule from a single deterministic point to a space supported
+by random variables (tile size, etc.), making it possible for developers to describe the design
+space with meta schedule APIs.
+
+We can extend the matmul example above to cover all possible tilings using these sampling
+instructions:
+
+```python
+# Sample tile sizes
+i_tiles = sch.sample_perfect_tile(i, n=4)
+j_tiles = sch.sample_perfect_tile(j, n=4)
+k_tiles = sch.sample_perfect_tile(k, n=2)
+# Tile the loops according to the random variables
+i_0, i_1, i_2, i_3 = sch.split(loop=i, factors=i_tiles)
+j_0, j_1, j_2, j_3 = sch.split(loop=j, factors=j_tiles)
+k_0, k_1           = sch.split(loop=k, factors=k_tiles)
+# Organize the loops into "SSRSRS" 6-level tiles
+sch.reorder(
+    i_0, j_0, # S: the 1st spatial tile
+    i_1, j_1, # S: the 2nd spatial tile
+    k_0,      # R: the 1st reduction tile
+    i_2, j_2, # S: the 3rd spatial tile
+    k_1,      # R: the 2nd reduction tile
+    i_3, j_3, # S: the 4th spatial tile
+)
+```
+
+### 3.4. AutoScheduler-style Design Space Generation
+
+AutoScheduler (Ansor) generates schedule templates by applying their SearchRules to each stage.
+SearchRule analyzes TE and eagerly trigger schedule primitives accordingly in its internally
+maintained mini IR.
+
+As introduced in Section 3.2, composite schedule rules are equivalent to AutoScheduler's SearchRule
+in TensorIR scheduling. To further generate a design space for scheduling, sampling instructions are
+used in composite schedule rules. Similarly, the sketch generation phase in AutoScheduler is
+equivalent to applying composite schedule rules to each block in TensorIR.
+
+Several built-in composite schedule rules are shipped with our system to align with Ansor's design
+space:
+
+* Multi-level tiling
+* Inline pure spatial blocks
+* Parallelize & vectorize & unroll
+* Auto tensorize
+
+Developers may implement their own rules in either Python or C++. They may specify which rules to
+use with the following syntax:
+
+```python
+from tvm import meta_schedule as ms
+
+design_space_generator = ms.PostOrderApply(rules=[
+    ms.MultiLevelTiling(...),
+    CustomRule(...),
+    ms.OtherBuiltinRules(...),
+])
+
+```
+
+### 3.5. Unifying manual schedule / AutoTVM / Ansor
+
+In this section, we show that the design space induced by TE manual schedule, AutoTVM and Ansor are
+all subsets of meta schedule, and meta schedule further allows mixing those three styles to search
+jointly.
+
+**Manual schedule**. The TE schedule is a special case of a meta schedule program, where there is no
+randomness introduced by sampling instructions. It is a single point in terms of design space.
+
+**AutoTVM (Template-based tuning)**. It is more natural representation of AutoTVM’s schedule
+templates (knobs) by writing one or more schedule functions in meta schedule with sampling
+instructions. The execution traces generated by the schedule functions are the design space to be
+explored.
+
+**AutoScheduler (Ansor, Template-free tuning)**. As mentioned in the previous section, application
+of composite schedule rules generates the design space, which is equivalent to Ansor’s sketch
+generation.
+
+**Mixing styles in design space definition**. By taking union of the spaces induced by the three
+special cases, our system allows developers to combine generic rules that Ansor provides and
+operator-specific scheduling.
+
+## 4. Reference-level explanation
+
+In this section, we introduce the underlying techniques for the automation system to extract and
+explore the design space. The figure below briefly illustrates the workflow of the system:
+
+![meta-schedule-workflow](../resources/meta-schedule-workflow.png)
+
+### 4.1. Execution trace as the design space
+
+**Trace**. To represent the design space defined by the meta schedule DSL, the underlying system
+records all the instructions users applied to the schedule class, including sampling and schedule
+primitives. We call this list of instructions a trace.
+
+For instance, executing the example above results in the following trace:
+
+```
+Instruction 0. Sample-Perfect-Tile
+Instruction 1. Sample-Perfect-Tile
+Instruction 2. Sample-Perfect-Tile
+Instruction 3. Split
+Instruction 4. Split
+Instruction 5. Split
+Instruction 6. Reorder
+```
+
+**Trace forms design space.** A trace may contain zero or more sampling instructions, which

Review comment:
       is it possible that a sampling instruction could be followed by a scheduling instruction, and during the search, the random variable chosen to populate the sampling instruction invalidates the scheduling instruction? e.g. what happens if a split instruction contains a specific tile size, but a previous sampling instruction reduced the available inner-loop dimensions below the specified tile size?

##########
File path: rfcs/0001-meta-schedule-autotensorir.md
##########
@@ -0,0 +1,444 @@
+<!--- Licensed to the Apache Software Foundation (ASF) under one -->
+<!--- or more contributor license agreements.  See the NOTICE file -->
+<!--- distributed with this work for additional information -->
+<!--- regarding copyright ownership.  The ASF licenses this file -->
+<!--- to you under the Apache License, Version 2.0 (the -->
+<!--- "License"); you may not use this file except in compliance -->
+<!--- with the License.  You may obtain a copy of the License at -->
+
+<!---   http://www.apache.org/licenses/LICENSE-2.0 -->
+
+<!--- Unless required by applicable law or agreed to in writing, -->
+<!--- software distributed under the License is distributed on an -->
+<!--- "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY -->
+<!--- KIND, either express or implied.  See the License for the -->
+<!--- specific language governing permissions and limitations -->
+<!--- under the License. -->
+
+* Feature Name: Meta Schedule (Formerly AutoTIR)
+* Start Date: 2021-05-28
+* RFC PR: TBD (apache/tvm-rfcs#0000)
+* GitHub Issue: TBD (apache/tvm-rfcs#0000)
+
+## 1. Summary
+
+This proposal introduces Meta Schedule: a probabilistic scheduling DSL on TIR that unifies the
+approaches of AutoTVM and Auto Scheduler (Ansor). Meta schedule provides a pragmatic way to define
+the space of automatic tuning, extensibility in terms of all possible TIR schedule primitives like
+tensorization and loop partitioning, and customizability on every layer of the automation system.
+
+Meta Schedule is our 3rd generation automatic scheduling system.
+
+## 2. Motivation
+
+**Scheduling and design space.** In TVM TensorIR, optimization of a TensorIR program is done via a
+sequence of transformations. For example, we reorder loops for better locality and we tensorize for
+specific hardware intrinsics. The process of invoking such a set of pre-defined transformations is
+called "**scheduling**", and each transformation is called a "**schedule primitive**". These
+primitives form a domain-specific language (DSL) describing the transformation of TensorIR programs.
+**Design space** is the set of all possible schedulings with respect to a TensorIR program.
+
+**Problems with the current scheduling system.** Currently we have 3 sets of scheduling APIs:
+* **Manual schedule**: Developers optimize their programs by manually invoking schedule primitives,
+  i.e. explore points in the design space with humans in the loop. This can be a tedious and
+  error-prone approach, hence the creation of AutoTVM and AutoScheduler (Ansor).
+* **AutoTVM**: The automation system requires users to define "schedule templates" as the design
+  space for each operator. Therefore, it is inextensible to hundreds of operators and variety
+  hardware platforms.
+* **AutoScheduler (Ansor)**: It automatically generates schedule templates as the design space,
+  according to a set of predefined "search rules". However, it is non-trivial to extend
+  AutoScheduler to new schedule primitives (tensorize, loop partition, software pipelining, etc).
+* The three systems above have isolated sets of APIs with several layers of their own abstraction,
+  which are not only hard to learn, but also engineering-intensive to customize.
+
+**Benefits of Meta Schedule.**  Meta schedule provides:
+* Succinct syntax, consistent APIs to TensorIR schedule with no other layer of abstraction.
+* Unified APIs for implementing manual schedules, AutoTVM-style schedules, and AutoScheduler-style
+  schedules.
+* Extensibility of all the schedule primitives, including tensorization and loop partitioning.
+  Almost no extra effort is needed to use a new primitive in auto-tuning.
+* The automation infrastructure is extensible on every of its components. Every component of the
+  system can be customized easily in pure python or C++ or both. For example, one can develop a new
+  design space generator in python, a new ProgramRunner in python, etc.
+
+
+## 3. Guide-level explanation
+
+In this section, we describe the syntax of meta schedule DSL, and how it could be used to describe
+and auto-generate the design space.
+
+### 3.1. Manual Schedule
+
+Meta schedule APIs are almost the same as TE or TensorIR scheduling. Here is an example of a manual
+schedule for matrix multiplication:
+
+```python
+# Designate a set of tile sizes
+i_tiles = [16, 8, 8, 8]
+j_tiles = [16, 8, 8, 8]
+k_tiles = [256, 8]
+
+# Tile the loops according to the tile sizes
+i_0, i_1, i_2, i_3 = sch.split(loop=i, factors=i_tiles)
+j_0, j_1, j_2, j_3 = sch.split(loop=j, factors=j_tiles)
+k_0, k_1           = sch.split(loop=k, factors=k_tiles)
+
+# Organize the loops into "SSRSRS" 6-level tiles
+sch.reorder(
+    i_0, j_0, # S: the 1st spatial tile
+    i_1, j_1, # S: the 2nd spatial tile
+    k_0,      # R: the 1st reduction tile
+    i_2, j_2, # S: the 3rd spatial tile
+    k_1,      # R: the 2nd reduction tile
+    i_3, j_3, # S: the 4th spatial tile
+)
+```
+
+In this example, the developers may tweak the tile sizes and measure the performance of the
+generated kernels to explore the opportunities of potential optimization.
+
+Generally speaking, while writing a schedule, there are often some parameters that are hard to
+determine ahead of time, for example, tile sizes, unroll steps, or which tensor intrinsics to use.
+Developers may manually enumerate possible combinations of these unknown factors, and then pick the
+best schedule according to measurement results on their device.
+
+### 3.2. Composite Schedule Rules
+
+As introduced in the previous section, in TensorIR, each schedule primitive handles only a very
+basic transformation of the IR. For example, `split` only splits a loop into two new loops. In the
+real world, the over-fine granularity of those primitives usually leads to repetitive and verbose
+scheduling code, as
+[mentioned](https://discuss.tvm.apache.org/t/rfc-tensorir-a-schedulable-ir-for-tvm/7872/43?u=junrushao1994)
+by developers in our community.
+
+To make it more convenient and modular, we allow users to register "composite schedules" that apply
+a sequence of schedule primitives according to certain analysis of the IR. For instance, a composite
+schedule may inspect a TensorIR block and decide whether we should call `compute_inline` on it.
+
+### 3.3. AutoTVM-style Design Space Description
+
+Meta schedule extends the schedule DSL with sampling instructions. When included in a schedule,
+these instructions parametrize the schedule from a single deterministic point to a space supported
+by random variables (tile size, etc.), making it possible for developers to describe the design
+space with meta schedule APIs.
+
+We can extend the matmul example above to cover all possible tilings using these sampling
+instructions:
+
+```python
+# Sample tile sizes
+i_tiles = sch.sample_perfect_tile(i, n=4)
+j_tiles = sch.sample_perfect_tile(j, n=4)
+k_tiles = sch.sample_perfect_tile(k, n=2)
+# Tile the loops according to the random variables
+i_0, i_1, i_2, i_3 = sch.split(loop=i, factors=i_tiles)
+j_0, j_1, j_2, j_3 = sch.split(loop=j, factors=j_tiles)
+k_0, k_1           = sch.split(loop=k, factors=k_tiles)
+# Organize the loops into "SSRSRS" 6-level tiles
+sch.reorder(
+    i_0, j_0, # S: the 1st spatial tile
+    i_1, j_1, # S: the 2nd spatial tile
+    k_0,      # R: the 1st reduction tile
+    i_2, j_2, # S: the 3rd spatial tile
+    k_1,      # R: the 2nd reduction tile
+    i_3, j_3, # S: the 4th spatial tile
+)
+```
+
+### 3.4. AutoScheduler-style Design Space Generation
+
+AutoScheduler (Ansor) generates schedule templates by applying their SearchRules to each stage.
+SearchRule analyzes TE and eagerly trigger schedule primitives accordingly in its internally
+maintained mini IR.
+
+As introduced in Section 3.2, composite schedule rules are equivalent to AutoScheduler's SearchRule
+in TensorIR scheduling. To further generate a design space for scheduling, sampling instructions are
+used in composite schedule rules. Similarly, the sketch generation phase in AutoScheduler is
+equivalent to applying composite schedule rules to each block in TensorIR.
+
+Several built-in composite schedule rules are shipped with our system to align with Ansor's design
+space:
+
+* Multi-level tiling
+* Inline pure spatial blocks
+* Parallelize & vectorize & unroll
+* Auto tensorize
+
+Developers may implement their own rules in either Python or C++. They may specify which rules to
+use with the following syntax:
+
+```python
+from tvm import meta_schedule as ms
+
+design_space_generator = ms.PostOrderApply(rules=[
+    ms.MultiLevelTiling(...),
+    CustomRule(...),
+    ms.OtherBuiltinRules(...),
+])
+
+```
+
+### 3.5. Unifying manual schedule / AutoTVM / Ansor
+
+In this section, we show that the design space induced by TE manual schedule, AutoTVM and Ansor are
+all subsets of meta schedule, and meta schedule further allows mixing those three styles to search
+jointly.
+
+**Manual schedule**. The TE schedule is a special case of a meta schedule program, where there is no
+randomness introduced by sampling instructions. It is a single point in terms of design space.
+
+**AutoTVM (Template-based tuning)**. It is more natural representation of AutoTVM’s schedule
+templates (knobs) by writing one or more schedule functions in meta schedule with sampling
+instructions. The execution traces generated by the schedule functions are the design space to be
+explored.
+
+**AutoScheduler (Ansor, Template-free tuning)**. As mentioned in the previous section, application
+of composite schedule rules generates the design space, which is equivalent to Ansor’s sketch
+generation.
+
+**Mixing styles in design space definition**. By taking union of the spaces induced by the three
+special cases, our system allows developers to combine generic rules that Ansor provides and
+operator-specific scheduling.
+
+## 4. Reference-level explanation
+
+In this section, we introduce the underlying techniques for the automation system to extract and
+explore the design space. The figure below briefly illustrates the workflow of the system:
+
+![meta-schedule-workflow](../resources/meta-schedule-workflow.png)
+
+### 4.1. Execution trace as the design space
+
+**Trace**. To represent the design space defined by the meta schedule DSL, the underlying system
+records all the instructions users applied to the schedule class, including sampling and schedule
+primitives. We call this list of instructions a trace.
+
+For instance, executing the example above results in the following trace:
+
+```
+Instruction 0. Sample-Perfect-Tile
+Instruction 1. Sample-Perfect-Tile
+Instruction 2. Sample-Perfect-Tile
+Instruction 3. Split
+Instruction 4. Split
+Instruction 5. Split
+Instruction 6. Reorder
+```
+
+**Trace forms design space.** A trace may contain zero or more sampling instructions, which
+introduces the uncertainty in scheduling - one instance of sampling results in one point in the
+design space. Therefore, the trace itself forms a design space to explore, e.g. which set of tile
+sizes works best on a specific hardware.
+
+**Union of design space**. Our system works on a set of traces, representing the union of the design
+spaces represented by every single trace.
+
+**Fork a trace**. When two different decisions in the scheduling process are equally important to

Review comment:
       how would we know they are "equally important?" do you mean that one decision may yield better performance on one architecture, and another decision may yield better performance on another architecture?

##########
File path: rfcs/0001-meta-schedule-autotensorir.md
##########
@@ -0,0 +1,444 @@
+<!--- Licensed to the Apache Software Foundation (ASF) under one -->
+<!--- or more contributor license agreements.  See the NOTICE file -->
+<!--- distributed with this work for additional information -->
+<!--- regarding copyright ownership.  The ASF licenses this file -->
+<!--- to you under the Apache License, Version 2.0 (the -->
+<!--- "License"); you may not use this file except in compliance -->
+<!--- with the License.  You may obtain a copy of the License at -->
+
+<!---   http://www.apache.org/licenses/LICENSE-2.0 -->
+
+<!--- Unless required by applicable law or agreed to in writing, -->
+<!--- software distributed under the License is distributed on an -->
+<!--- "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY -->
+<!--- KIND, either express or implied.  See the License for the -->
+<!--- specific language governing permissions and limitations -->
+<!--- under the License. -->
+
+* Feature Name: Meta Schedule (Formerly AutoTIR)
+* Start Date: 2021-05-28
+* RFC PR: TBD (apache/tvm-rfcs#0000)
+* GitHub Issue: TBD (apache/tvm-rfcs#0000)
+
+## 1. Summary
+
+This proposal introduces Meta Schedule: a probabilistic scheduling DSL on TIR that unifies the
+approaches of AutoTVM and Auto Scheduler (Ansor). Meta schedule provides a pragmatic way to define
+the space of automatic tuning, extensibility in terms of all possible TIR schedule primitives like
+tensorization and loop partitioning, and customizability on every layer of the automation system.
+
+Meta Schedule is our 3rd generation automatic scheduling system.
+
+## 2. Motivation
+
+**Scheduling and design space.** In TVM TensorIR, optimization of a TensorIR program is done via a
+sequence of transformations. For example, we reorder loops for better locality and we tensorize for
+specific hardware intrinsics. The process of invoking such a set of pre-defined transformations is
+called "**scheduling**", and each transformation is called a "**schedule primitive**". These
+primitives form a domain-specific language (DSL) describing the transformation of TensorIR programs.
+**Design space** is the set of all possible schedulings with respect to a TensorIR program.
+
+**Problems with the current scheduling system.** Currently we have 3 sets of scheduling APIs:
+* **Manual schedule**: Developers optimize their programs by manually invoking schedule primitives,
+  i.e. explore points in the design space with humans in the loop. This can be a tedious and
+  error-prone approach, hence the creation of AutoTVM and AutoScheduler (Ansor).
+* **AutoTVM**: The automation system requires users to define "schedule templates" as the design
+  space for each operator. Therefore, it is inextensible to hundreds of operators and variety
+  hardware platforms.
+* **AutoScheduler (Ansor)**: It automatically generates schedule templates as the design space,
+  according to a set of predefined "search rules". However, it is non-trivial to extend
+  AutoScheduler to new schedule primitives (tensorize, loop partition, software pipelining, etc).
+* The three systems above have isolated sets of APIs with several layers of their own abstraction,
+  which are not only hard to learn, but also engineering-intensive to customize.
+
+**Benefits of Meta Schedule.**  Meta schedule provides:
+* Succinct syntax, consistent APIs to TensorIR schedule with no other layer of abstraction.
+* Unified APIs for implementing manual schedules, AutoTVM-style schedules, and AutoScheduler-style
+  schedules.
+* Extensibility of all the schedule primitives, including tensorization and loop partitioning.
+  Almost no extra effort is needed to use a new primitive in auto-tuning.
+* The automation infrastructure is extensible on every of its components. Every component of the
+  system can be customized easily in pure python or C++ or both. For example, one can develop a new
+  design space generator in python, a new ProgramRunner in python, etc.
+
+
+## 3. Guide-level explanation
+
+In this section, we describe the syntax of meta schedule DSL, and how it could be used to describe
+and auto-generate the design space.
+
+### 3.1. Manual Schedule
+
+Meta schedule APIs are almost the same as TE or TensorIR scheduling. Here is an example of a manual
+schedule for matrix multiplication:
+
+```python
+# Designate a set of tile sizes
+i_tiles = [16, 8, 8, 8]
+j_tiles = [16, 8, 8, 8]
+k_tiles = [256, 8]
+
+# Tile the loops according to the tile sizes
+i_0, i_1, i_2, i_3 = sch.split(loop=i, factors=i_tiles)
+j_0, j_1, j_2, j_3 = sch.split(loop=j, factors=j_tiles)
+k_0, k_1           = sch.split(loop=k, factors=k_tiles)
+
+# Organize the loops into "SSRSRS" 6-level tiles
+sch.reorder(
+    i_0, j_0, # S: the 1st spatial tile
+    i_1, j_1, # S: the 2nd spatial tile
+    k_0,      # R: the 1st reduction tile
+    i_2, j_2, # S: the 3rd spatial tile
+    k_1,      # R: the 2nd reduction tile
+    i_3, j_3, # S: the 4th spatial tile
+)
+```
+
+In this example, the developers may tweak the tile sizes and measure the performance of the
+generated kernels to explore the opportunities of potential optimization.
+
+Generally speaking, while writing a schedule, there are often some parameters that are hard to
+determine ahead of time, for example, tile sizes, unroll steps, or which tensor intrinsics to use.
+Developers may manually enumerate possible combinations of these unknown factors, and then pick the
+best schedule according to measurement results on their device.
+
+### 3.2. Composite Schedule Rules
+
+As introduced in the previous section, in TensorIR, each schedule primitive handles only a very
+basic transformation of the IR. For example, `split` only splits a loop into two new loops. In the
+real world, the over-fine granularity of those primitives usually leads to repetitive and verbose
+scheduling code, as
+[mentioned](https://discuss.tvm.apache.org/t/rfc-tensorir-a-schedulable-ir-for-tvm/7872/43?u=junrushao1994)
+by developers in our community.
+
+To make it more convenient and modular, we allow users to register "composite schedules" that apply
+a sequence of schedule primitives according to certain analysis of the IR. For instance, a composite
+schedule may inspect a TensorIR block and decide whether we should call `compute_inline` on it.
+
+### 3.3. AutoTVM-style Design Space Description
+
+Meta schedule extends the schedule DSL with sampling instructions. When included in a schedule,
+these instructions parametrize the schedule from a single deterministic point to a space supported
+by random variables (tile size, etc.), making it possible for developers to describe the design
+space with meta schedule APIs.
+
+We can extend the matmul example above to cover all possible tilings using these sampling
+instructions:
+
+```python
+# Sample tile sizes
+i_tiles = sch.sample_perfect_tile(i, n=4)
+j_tiles = sch.sample_perfect_tile(j, n=4)
+k_tiles = sch.sample_perfect_tile(k, n=2)
+# Tile the loops according to the random variables
+i_0, i_1, i_2, i_3 = sch.split(loop=i, factors=i_tiles)
+j_0, j_1, j_2, j_3 = sch.split(loop=j, factors=j_tiles)
+k_0, k_1           = sch.split(loop=k, factors=k_tiles)
+# Organize the loops into "SSRSRS" 6-level tiles
+sch.reorder(
+    i_0, j_0, # S: the 1st spatial tile
+    i_1, j_1, # S: the 2nd spatial tile
+    k_0,      # R: the 1st reduction tile
+    i_2, j_2, # S: the 3rd spatial tile
+    k_1,      # R: the 2nd reduction tile
+    i_3, j_3, # S: the 4th spatial tile
+)
+```
+
+### 3.4. AutoScheduler-style Design Space Generation
+
+AutoScheduler (Ansor) generates schedule templates by applying their SearchRules to each stage.
+SearchRule analyzes TE and eagerly trigger schedule primitives accordingly in its internally
+maintained mini IR.
+
+As introduced in Section 3.2, composite schedule rules are equivalent to AutoScheduler's SearchRule
+in TensorIR scheduling. To further generate a design space for scheduling, sampling instructions are
+used in composite schedule rules. Similarly, the sketch generation phase in AutoScheduler is
+equivalent to applying composite schedule rules to each block in TensorIR.
+
+Several built-in composite schedule rules are shipped with our system to align with Ansor's design
+space:
+
+* Multi-level tiling
+* Inline pure spatial blocks
+* Parallelize & vectorize & unroll
+* Auto tensorize
+
+Developers may implement their own rules in either Python or C++. They may specify which rules to
+use with the following syntax:
+
+```python
+from tvm import meta_schedule as ms
+
+design_space_generator = ms.PostOrderApply(rules=[
+    ms.MultiLevelTiling(...),
+    CustomRule(...),
+    ms.OtherBuiltinRules(...),
+])
+
+```
+
+### 3.5. Unifying manual schedule / AutoTVM / Ansor
+
+In this section, we show that the design space induced by TE manual schedule, AutoTVM and Ansor are
+all subsets of meta schedule, and meta schedule further allows mixing those three styles to search
+jointly.
+
+**Manual schedule**. The TE schedule is a special case of a meta schedule program, where there is no
+randomness introduced by sampling instructions. It is a single point in terms of design space.
+
+**AutoTVM (Template-based tuning)**. It is more natural representation of AutoTVM’s schedule
+templates (knobs) by writing one or more schedule functions in meta schedule with sampling
+instructions. The execution traces generated by the schedule functions are the design space to be
+explored.
+
+**AutoScheduler (Ansor, Template-free tuning)**. As mentioned in the previous section, application
+of composite schedule rules generates the design space, which is equivalent to Ansor’s sketch
+generation.
+
+**Mixing styles in design space definition**. By taking union of the spaces induced by the three
+special cases, our system allows developers to combine generic rules that Ansor provides and
+operator-specific scheduling.
+
+## 4. Reference-level explanation
+
+In this section, we introduce the underlying techniques for the automation system to extract and
+explore the design space. The figure below briefly illustrates the workflow of the system:
+
+![meta-schedule-workflow](../resources/meta-schedule-workflow.png)
+
+### 4.1. Execution trace as the design space
+
+**Trace**. To represent the design space defined by the meta schedule DSL, the underlying system
+records all the instructions users applied to the schedule class, including sampling and schedule
+primitives. We call this list of instructions a trace.
+
+For instance, executing the example above results in the following trace:
+
+```
+Instruction 0. Sample-Perfect-Tile
+Instruction 1. Sample-Perfect-Tile
+Instruction 2. Sample-Perfect-Tile
+Instruction 3. Split
+Instruction 4. Split
+Instruction 5. Split
+Instruction 6. Reorder
+```
+
+**Trace forms design space.** A trace may contain zero or more sampling instructions, which
+introduces the uncertainty in scheduling - one instance of sampling results in one point in the
+design space. Therefore, the trace itself forms a design space to explore, e.g. which set of tile
+sizes works best on a specific hardware.
+
+**Union of design space**. Our system works on a set of traces, representing the union of the design
+spaces represented by every single trace.
+
+**Fork a trace**. When two different decisions in the scheduling process are equally important to
+generate high-performance schedules, we allow forking the trace into two, and the design space is
+the union of the forked traces.
+
+The trace is not strictly user-facing, but can be accessed and printed with the following syntax:
+
+```python
+# requires to trace the execution
+sch = tir.Schedule(..., traced=True)
+# do a lot of scheduling
+...
+# print the trace
+print(sch.trace)
+```
+
+And below is an example of the printed trace, which honestly reflects the schedule as a snippet of
+python scheduling function:
+
+```python
+b0 = sch.get_block(name="matmul", func_name="main")
+l1, l2, l3 = sch.get_loops(block=b0)
+v4, v5, v6, v7 = sch.sample_perfect_tile(loop=l1, n=4, max_innermost_factor=16, decision=[32, 1, 16, 2])
+v8, v9, v10, v11 = sch.sample_perfect_tile(loop=l2, n=4, max_innermost_factor=16, decision=[64, 4, 2, 2])
+v12, v13 = sch.sample_perfect_tile(loop=l3, n=2, max_innermost_factor=16, decision=[64, 16])
+l14, l15, l16, l17 = sch.split(loop=l1, factors=[v4, v5, v6, v7])
+l18, l19, l20, l21 = sch.split(loop=l2, factors=[v8, v9, v10, v11])
+l22, l23 = sch.split(loop=l3, factors=[v12, v13])
+sch.reorder(l14, l18, l15, l19, l22, l16, l20, l23, l17, l21)
+```
+
+### 4.2. Exploring the Design Space
+
+Meta Schedule provides several built-in exploration strategies to exhaustively or efficiently search
+for efficient schedules.
+
+**Random search by replaying schedule functions.** With a user-provided schedule function
+as a black-box design space generator, our system could repetitively invoke such an opaque function
+without doing any extra analysis. The function could be written in C++ or Python, or any language
+that implements packed function FFI. If sampling instructions are present in the function, then each
+invocation results in a different IRModule after being scheduled because the random decisions are

Review comment:
       i think you could state this more succinctly as "If sampling instructions are present in the trace, then scheduling is non-deterministic (random decisions may not be repeated across runs)."
   
   a follow-on question is: for debugging or for use in e.g. repeatable builds, it's likely that users will at some point want to make scheduling deterministic. shouldn't this process export a log of the random variables chosen to fill the trace, so that the IRModule then is not so precious?

##########
File path: rfcs/0001-meta-schedule-autotensorir.md
##########
@@ -0,0 +1,444 @@
+<!--- Licensed to the Apache Software Foundation (ASF) under one -->
+<!--- or more contributor license agreements.  See the NOTICE file -->
+<!--- distributed with this work for additional information -->
+<!--- regarding copyright ownership.  The ASF licenses this file -->
+<!--- to you under the Apache License, Version 2.0 (the -->
+<!--- "License"); you may not use this file except in compliance -->
+<!--- with the License.  You may obtain a copy of the License at -->
+
+<!---   http://www.apache.org/licenses/LICENSE-2.0 -->
+
+<!--- Unless required by applicable law or agreed to in writing, -->
+<!--- software distributed under the License is distributed on an -->
+<!--- "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY -->
+<!--- KIND, either express or implied.  See the License for the -->
+<!--- specific language governing permissions and limitations -->
+<!--- under the License. -->
+
+* Feature Name: Meta Schedule (Formerly AutoTIR)
+* Start Date: 2021-05-28
+* RFC PR: TBD (apache/tvm-rfcs#0000)
+* GitHub Issue: TBD (apache/tvm-rfcs#0000)
+
+## 1. Summary
+
+This proposal introduces Meta Schedule: a probabilistic scheduling DSL on TIR that unifies the
+approaches of AutoTVM and Auto Scheduler (Ansor). Meta schedule provides a pragmatic way to define
+the space of automatic tuning, extensibility in terms of all possible TIR schedule primitives like
+tensorization and loop partitioning, and customizability on every layer of the automation system.
+
+Meta Schedule is our 3rd generation automatic scheduling system.
+
+## 2. Motivation
+
+**Scheduling and design space.** In TVM TensorIR, optimization of a TensorIR program is done via a
+sequence of transformations. For example, we reorder loops for better locality and we tensorize for
+specific hardware intrinsics. The process of invoking such a set of pre-defined transformations is
+called "**scheduling**", and each transformation is called a "**schedule primitive**". These
+primitives form a domain-specific language (DSL) describing the transformation of TensorIR programs.
+**Design space** is the set of all possible schedulings with respect to a TensorIR program.
+
+**Problems with the current scheduling system.** Currently we have 3 sets of scheduling APIs:
+* **Manual schedule**: Developers optimize their programs by manually invoking schedule primitives,
+  i.e. explore points in the design space with humans in the loop. This can be a tedious and
+  error-prone approach, hence the creation of AutoTVM and AutoScheduler (Ansor).
+* **AutoTVM**: The automation system requires users to define "schedule templates" as the design
+  space for each operator. Therefore, it is inextensible to hundreds of operators and variety
+  hardware platforms.
+* **AutoScheduler (Ansor)**: It automatically generates schedule templates as the design space,
+  according to a set of predefined "search rules". However, it is non-trivial to extend
+  AutoScheduler to new schedule primitives (tensorize, loop partition, software pipelining, etc).
+* The three systems above have isolated sets of APIs with several layers of their own abstraction,
+  which are not only hard to learn, but also engineering-intensive to customize.
+
+**Benefits of Meta Schedule.**  Meta schedule provides:
+* Succinct syntax, consistent APIs to TensorIR schedule with no other layer of abstraction.
+* Unified APIs for implementing manual schedules, AutoTVM-style schedules, and AutoScheduler-style
+  schedules.
+* Extensibility of all the schedule primitives, including tensorization and loop partitioning.
+  Almost no extra effort is needed to use a new primitive in auto-tuning.
+* The automation infrastructure is extensible on every of its components. Every component of the
+  system can be customized easily in pure python or C++ or both. For example, one can develop a new
+  design space generator in python, a new ProgramRunner in python, etc.
+
+
+## 3. Guide-level explanation
+
+In this section, we describe the syntax of meta schedule DSL, and how it could be used to describe
+and auto-generate the design space.
+
+### 3.1. Manual Schedule
+
+Meta schedule APIs are almost the same as TE or TensorIR scheduling. Here is an example of a manual
+schedule for matrix multiplication:
+
+```python
+# Designate a set of tile sizes
+i_tiles = [16, 8, 8, 8]
+j_tiles = [16, 8, 8, 8]
+k_tiles = [256, 8]
+
+# Tile the loops according to the tile sizes
+i_0, i_1, i_2, i_3 = sch.split(loop=i, factors=i_tiles)
+j_0, j_1, j_2, j_3 = sch.split(loop=j, factors=j_tiles)
+k_0, k_1           = sch.split(loop=k, factors=k_tiles)
+
+# Organize the loops into "SSRSRS" 6-level tiles
+sch.reorder(
+    i_0, j_0, # S: the 1st spatial tile
+    i_1, j_1, # S: the 2nd spatial tile
+    k_0,      # R: the 1st reduction tile
+    i_2, j_2, # S: the 3rd spatial tile
+    k_1,      # R: the 2nd reduction tile
+    i_3, j_3, # S: the 4th spatial tile
+)
+```
+
+In this example, the developers may tweak the tile sizes and measure the performance of the
+generated kernels to explore the opportunities of potential optimization.
+
+Generally speaking, while writing a schedule, there are often some parameters that are hard to
+determine ahead of time, for example, tile sizes, unroll steps, or which tensor intrinsics to use.
+Developers may manually enumerate possible combinations of these unknown factors, and then pick the
+best schedule according to measurement results on their device.
+
+### 3.2. Composite Schedule Rules
+
+As introduced in the previous section, in TensorIR, each schedule primitive handles only a very
+basic transformation of the IR. For example, `split` only splits a loop into two new loops. In the
+real world, the over-fine granularity of those primitives usually leads to repetitive and verbose
+scheduling code, as
+[mentioned](https://discuss.tvm.apache.org/t/rfc-tensorir-a-schedulable-ir-for-tvm/7872/43?u=junrushao1994)
+by developers in our community.
+
+To make it more convenient and modular, we allow users to register "composite schedules" that apply
+a sequence of schedule primitives according to certain analysis of the IR. For instance, a composite
+schedule may inspect a TensorIR block and decide whether we should call `compute_inline` on it.
+
+### 3.3. AutoTVM-style Design Space Description
+
+Meta schedule extends the schedule DSL with sampling instructions. When included in a schedule,
+these instructions parametrize the schedule from a single deterministic point to a space supported
+by random variables (tile size, etc.), making it possible for developers to describe the design
+space with meta schedule APIs.
+
+We can extend the matmul example above to cover all possible tilings using these sampling
+instructions:
+
+```python
+# Sample tile sizes
+i_tiles = sch.sample_perfect_tile(i, n=4)
+j_tiles = sch.sample_perfect_tile(j, n=4)
+k_tiles = sch.sample_perfect_tile(k, n=2)
+# Tile the loops according to the random variables
+i_0, i_1, i_2, i_3 = sch.split(loop=i, factors=i_tiles)
+j_0, j_1, j_2, j_3 = sch.split(loop=j, factors=j_tiles)
+k_0, k_1           = sch.split(loop=k, factors=k_tiles)
+# Organize the loops into "SSRSRS" 6-level tiles
+sch.reorder(
+    i_0, j_0, # S: the 1st spatial tile
+    i_1, j_1, # S: the 2nd spatial tile
+    k_0,      # R: the 1st reduction tile
+    i_2, j_2, # S: the 3rd spatial tile
+    k_1,      # R: the 2nd reduction tile
+    i_3, j_3, # S: the 4th spatial tile
+)
+```
+
+### 3.4. AutoScheduler-style Design Space Generation
+
+AutoScheduler (Ansor) generates schedule templates by applying their SearchRules to each stage.
+SearchRule analyzes TE and eagerly trigger schedule primitives accordingly in its internally
+maintained mini IR.
+
+As introduced in Section 3.2, composite schedule rules are equivalent to AutoScheduler's SearchRule
+in TensorIR scheduling. To further generate a design space for scheduling, sampling instructions are
+used in composite schedule rules. Similarly, the sketch generation phase in AutoScheduler is
+equivalent to applying composite schedule rules to each block in TensorIR.
+
+Several built-in composite schedule rules are shipped with our system to align with Ansor's design
+space:
+
+* Multi-level tiling
+* Inline pure spatial blocks
+* Parallelize & vectorize & unroll
+* Auto tensorize
+
+Developers may implement their own rules in either Python or C++. They may specify which rules to
+use with the following syntax:
+
+```python
+from tvm import meta_schedule as ms
+
+design_space_generator = ms.PostOrderApply(rules=[
+    ms.MultiLevelTiling(...),
+    CustomRule(...),
+    ms.OtherBuiltinRules(...),
+])
+
+```
+
+### 3.5. Unifying manual schedule / AutoTVM / Ansor
+
+In this section, we show that the design space induced by TE manual schedule, AutoTVM and Ansor are
+all subsets of meta schedule, and meta schedule further allows mixing those three styles to search
+jointly.
+
+**Manual schedule**. The TE schedule is a special case of a meta schedule program, where there is no
+randomness introduced by sampling instructions. It is a single point in terms of design space.
+
+**AutoTVM (Template-based tuning)**. It is more natural representation of AutoTVM’s schedule
+templates (knobs) by writing one or more schedule functions in meta schedule with sampling
+instructions. The execution traces generated by the schedule functions are the design space to be
+explored.
+
+**AutoScheduler (Ansor, Template-free tuning)**. As mentioned in the previous section, application
+of composite schedule rules generates the design space, which is equivalent to Ansor’s sketch
+generation.
+
+**Mixing styles in design space definition**. By taking union of the spaces induced by the three
+special cases, our system allows developers to combine generic rules that Ansor provides and
+operator-specific scheduling.
+
+## 4. Reference-level explanation
+
+In this section, we introduce the underlying techniques for the automation system to extract and
+explore the design space. The figure below briefly illustrates the workflow of the system:
+
+![meta-schedule-workflow](../resources/meta-schedule-workflow.png)
+
+### 4.1. Execution trace as the design space
+
+**Trace**. To represent the design space defined by the meta schedule DSL, the underlying system
+records all the instructions users applied to the schedule class, including sampling and schedule
+primitives. We call this list of instructions a trace.
+
+For instance, executing the example above results in the following trace:
+
+```
+Instruction 0. Sample-Perfect-Tile
+Instruction 1. Sample-Perfect-Tile
+Instruction 2. Sample-Perfect-Tile
+Instruction 3. Split
+Instruction 4. Split
+Instruction 5. Split
+Instruction 6. Reorder
+```
+
+**Trace forms design space.** A trace may contain zero or more sampling instructions, which
+introduces the uncertainty in scheduling - one instance of sampling results in one point in the
+design space. Therefore, the trace itself forms a design space to explore, e.g. which set of tile
+sizes works best on a specific hardware.
+
+**Union of design space**. Our system works on a set of traces, representing the union of the design
+spaces represented by every single trace.
+
+**Fork a trace**. When two different decisions in the scheduling process are equally important to
+generate high-performance schedules, we allow forking the trace into two, and the design space is
+the union of the forked traces.
+
+The trace is not strictly user-facing, but can be accessed and printed with the following syntax:
+
+```python
+# requires to trace the execution
+sch = tir.Schedule(..., traced=True)
+# do a lot of scheduling
+...
+# print the trace
+print(sch.trace)
+```
+
+And below is an example of the printed trace, which honestly reflects the schedule as a snippet of
+python scheduling function:
+
+```python
+b0 = sch.get_block(name="matmul", func_name="main")
+l1, l2, l3 = sch.get_loops(block=b0)
+v4, v5, v6, v7 = sch.sample_perfect_tile(loop=l1, n=4, max_innermost_factor=16, decision=[32, 1, 16, 2])
+v8, v9, v10, v11 = sch.sample_perfect_tile(loop=l2, n=4, max_innermost_factor=16, decision=[64, 4, 2, 2])
+v12, v13 = sch.sample_perfect_tile(loop=l3, n=2, max_innermost_factor=16, decision=[64, 16])
+l14, l15, l16, l17 = sch.split(loop=l1, factors=[v4, v5, v6, v7])
+l18, l19, l20, l21 = sch.split(loop=l2, factors=[v8, v9, v10, v11])
+l22, l23 = sch.split(loop=l3, factors=[v12, v13])
+sch.reorder(l14, l18, l15, l19, l22, l16, l20, l23, l17, l21)
+```
+
+### 4.2. Exploring the Design Space
+
+Meta Schedule provides several built-in exploration strategies to exhaustively or efficiently search
+for efficient schedules.
+
+**Random search by replaying schedule functions.** With a user-provided schedule function

Review comment:
       make these into headings or sub-sections

##########
File path: rfcs/0001-meta-schedule-autotensorir.md
##########
@@ -0,0 +1,444 @@
+<!--- Licensed to the Apache Software Foundation (ASF) under one -->
+<!--- or more contributor license agreements.  See the NOTICE file -->
+<!--- distributed with this work for additional information -->
+<!--- regarding copyright ownership.  The ASF licenses this file -->
+<!--- to you under the Apache License, Version 2.0 (the -->
+<!--- "License"); you may not use this file except in compliance -->
+<!--- with the License.  You may obtain a copy of the License at -->
+
+<!---   http://www.apache.org/licenses/LICENSE-2.0 -->
+
+<!--- Unless required by applicable law or agreed to in writing, -->
+<!--- software distributed under the License is distributed on an -->
+<!--- "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY -->
+<!--- KIND, either express or implied.  See the License for the -->
+<!--- specific language governing permissions and limitations -->
+<!--- under the License. -->
+
+* Feature Name: Meta Schedule (Formerly AutoTIR)
+* Start Date: 2021-05-28
+* RFC PR: TBD (apache/tvm-rfcs#0000)
+* GitHub Issue: TBD (apache/tvm-rfcs#0000)
+
+## 1. Summary
+
+This proposal introduces Meta Schedule: a probabilistic scheduling DSL on TIR that unifies the
+approaches of AutoTVM and Auto Scheduler (Ansor). Meta schedule provides a pragmatic way to define
+the space of automatic tuning, extensibility in terms of all possible TIR schedule primitives like
+tensorization and loop partitioning, and customizability on every layer of the automation system.
+
+Meta Schedule is our 3rd generation automatic scheduling system.
+
+## 2. Motivation
+
+**Scheduling and design space.** In TVM TensorIR, optimization of a TensorIR program is done via a
+sequence of transformations. For example, we reorder loops for better locality and we tensorize for
+specific hardware intrinsics. The process of invoking such a set of pre-defined transformations is
+called "**scheduling**", and each transformation is called a "**schedule primitive**". These
+primitives form a domain-specific language (DSL) describing the transformation of TensorIR programs.
+**Design space** is the set of all possible schedulings with respect to a TensorIR program.
+
+**Problems with the current scheduling system.** Currently we have 3 sets of scheduling APIs:
+* **Manual schedule**: Developers optimize their programs by manually invoking schedule primitives,
+  i.e. explore points in the design space with humans in the loop. This can be a tedious and
+  error-prone approach, hence the creation of AutoTVM and AutoScheduler (Ansor).
+* **AutoTVM**: The automation system requires users to define "schedule templates" as the design
+  space for each operator. Therefore, it is inextensible to hundreds of operators and variety
+  hardware platforms.
+* **AutoScheduler (Ansor)**: It automatically generates schedule templates as the design space,
+  according to a set of predefined "search rules". However, it is non-trivial to extend
+  AutoScheduler to new schedule primitives (tensorize, loop partition, software pipelining, etc).
+* The three systems above have isolated sets of APIs with several layers of their own abstraction,
+  which are not only hard to learn, but also engineering-intensive to customize.
+
+**Benefits of Meta Schedule.**  Meta schedule provides:
+* Succinct syntax, consistent APIs to TensorIR schedule with no other layer of abstraction.
+* Unified APIs for implementing manual schedules, AutoTVM-style schedules, and AutoScheduler-style
+  schedules.
+* Extensibility of all the schedule primitives, including tensorization and loop partitioning.
+  Almost no extra effort is needed to use a new primitive in auto-tuning.
+* The automation infrastructure is extensible on every of its components. Every component of the
+  system can be customized easily in pure python or C++ or both. For example, one can develop a new
+  design space generator in python, a new ProgramRunner in python, etc.
+
+
+## 3. Guide-level explanation
+
+In this section, we describe the syntax of meta schedule DSL, and how it could be used to describe
+and auto-generate the design space.
+
+### 3.1. Manual Schedule
+
+Meta schedule APIs are almost the same as TE or TensorIR scheduling. Here is an example of a manual
+schedule for matrix multiplication:
+
+```python
+# Designate a set of tile sizes
+i_tiles = [16, 8, 8, 8]
+j_tiles = [16, 8, 8, 8]
+k_tiles = [256, 8]
+
+# Tile the loops according to the tile sizes
+i_0, i_1, i_2, i_3 = sch.split(loop=i, factors=i_tiles)
+j_0, j_1, j_2, j_3 = sch.split(loop=j, factors=j_tiles)
+k_0, k_1           = sch.split(loop=k, factors=k_tiles)
+
+# Organize the loops into "SSRSRS" 6-level tiles
+sch.reorder(
+    i_0, j_0, # S: the 1st spatial tile
+    i_1, j_1, # S: the 2nd spatial tile
+    k_0,      # R: the 1st reduction tile
+    i_2, j_2, # S: the 3rd spatial tile
+    k_1,      # R: the 2nd reduction tile
+    i_3, j_3, # S: the 4th spatial tile
+)
+```
+
+In this example, the developers may tweak the tile sizes and measure the performance of the
+generated kernels to explore the opportunities of potential optimization.
+
+Generally speaking, while writing a schedule, there are often some parameters that are hard to
+determine ahead of time, for example, tile sizes, unroll steps, or which tensor intrinsics to use.
+Developers may manually enumerate possible combinations of these unknown factors, and then pick the
+best schedule according to measurement results on their device.
+
+### 3.2. Composite Schedule Rules
+
+As introduced in the previous section, in TensorIR, each schedule primitive handles only a very
+basic transformation of the IR. For example, `split` only splits a loop into two new loops. In the
+real world, the over-fine granularity of those primitives usually leads to repetitive and verbose
+scheduling code, as
+[mentioned](https://discuss.tvm.apache.org/t/rfc-tensorir-a-schedulable-ir-for-tvm/7872/43?u=junrushao1994)
+by developers in our community.
+
+To make it more convenient and modular, we allow users to register "composite schedules" that apply
+a sequence of schedule primitives according to certain analysis of the IR. For instance, a composite
+schedule may inspect a TensorIR block and decide whether we should call `compute_inline` on it.
+
+### 3.3. AutoTVM-style Design Space Description
+
+Meta schedule extends the schedule DSL with sampling instructions. When included in a schedule,
+these instructions parametrize the schedule from a single deterministic point to a space supported
+by random variables (tile size, etc.), making it possible for developers to describe the design
+space with meta schedule APIs.
+
+We can extend the matmul example above to cover all possible tilings using these sampling
+instructions:
+
+```python
+# Sample tile sizes
+i_tiles = sch.sample_perfect_tile(i, n=4)
+j_tiles = sch.sample_perfect_tile(j, n=4)
+k_tiles = sch.sample_perfect_tile(k, n=2)
+# Tile the loops according to the random variables
+i_0, i_1, i_2, i_3 = sch.split(loop=i, factors=i_tiles)
+j_0, j_1, j_2, j_3 = sch.split(loop=j, factors=j_tiles)
+k_0, k_1           = sch.split(loop=k, factors=k_tiles)
+# Organize the loops into "SSRSRS" 6-level tiles
+sch.reorder(
+    i_0, j_0, # S: the 1st spatial tile
+    i_1, j_1, # S: the 2nd spatial tile
+    k_0,      # R: the 1st reduction tile
+    i_2, j_2, # S: the 3rd spatial tile
+    k_1,      # R: the 2nd reduction tile
+    i_3, j_3, # S: the 4th spatial tile
+)
+```
+
+### 3.4. AutoScheduler-style Design Space Generation
+
+AutoScheduler (Ansor) generates schedule templates by applying their SearchRules to each stage.
+SearchRule analyzes TE and eagerly trigger schedule primitives accordingly in its internally
+maintained mini IR.
+
+As introduced in Section 3.2, composite schedule rules are equivalent to AutoScheduler's SearchRule
+in TensorIR scheduling. To further generate a design space for scheduling, sampling instructions are
+used in composite schedule rules. Similarly, the sketch generation phase in AutoScheduler is
+equivalent to applying composite schedule rules to each block in TensorIR.
+
+Several built-in composite schedule rules are shipped with our system to align with Ansor's design
+space:
+
+* Multi-level tiling
+* Inline pure spatial blocks
+* Parallelize & vectorize & unroll
+* Auto tensorize
+
+Developers may implement their own rules in either Python or C++. They may specify which rules to
+use with the following syntax:
+
+```python
+from tvm import meta_schedule as ms
+
+design_space_generator = ms.PostOrderApply(rules=[
+    ms.MultiLevelTiling(...),
+    CustomRule(...),
+    ms.OtherBuiltinRules(...),
+])
+
+```
+
+### 3.5. Unifying manual schedule / AutoTVM / Ansor
+
+In this section, we show that the design space induced by TE manual schedule, AutoTVM and Ansor are
+all subsets of meta schedule, and meta schedule further allows mixing those three styles to search
+jointly.
+
+**Manual schedule**. The TE schedule is a special case of a meta schedule program, where there is no
+randomness introduced by sampling instructions. It is a single point in terms of design space.
+
+**AutoTVM (Template-based tuning)**. It is more natural representation of AutoTVM’s schedule
+templates (knobs) by writing one or more schedule functions in meta schedule with sampling
+instructions. The execution traces generated by the schedule functions are the design space to be
+explored.
+
+**AutoScheduler (Ansor, Template-free tuning)**. As mentioned in the previous section, application
+of composite schedule rules generates the design space, which is equivalent to Ansor’s sketch
+generation.
+
+**Mixing styles in design space definition**. By taking union of the spaces induced by the three
+special cases, our system allows developers to combine generic rules that Ansor provides and
+operator-specific scheduling.
+
+## 4. Reference-level explanation
+
+In this section, we introduce the underlying techniques for the automation system to extract and
+explore the design space. The figure below briefly illustrates the workflow of the system:
+
+![meta-schedule-workflow](../resources/meta-schedule-workflow.png)
+
+### 4.1. Execution trace as the design space
+
+**Trace**. To represent the design space defined by the meta schedule DSL, the underlying system
+records all the instructions users applied to the schedule class, including sampling and schedule
+primitives. We call this list of instructions a trace.
+
+For instance, executing the example above results in the following trace:
+
+```
+Instruction 0. Sample-Perfect-Tile
+Instruction 1. Sample-Perfect-Tile
+Instruction 2. Sample-Perfect-Tile
+Instruction 3. Split
+Instruction 4. Split
+Instruction 5. Split
+Instruction 6. Reorder
+```
+
+**Trace forms design space.** A trace may contain zero or more sampling instructions, which
+introduces the uncertainty in scheduling - one instance of sampling results in one point in the
+design space. Therefore, the trace itself forms a design space to explore, e.g. which set of tile
+sizes works best on a specific hardware.
+
+**Union of design space**. Our system works on a set of traces, representing the union of the design
+spaces represented by every single trace.
+
+**Fork a trace**. When two different decisions in the scheduling process are equally important to
+generate high-performance schedules, we allow forking the trace into two, and the design space is
+the union of the forked traces.
+
+The trace is not strictly user-facing, but can be accessed and printed with the following syntax:
+
+```python
+# requires to trace the execution
+sch = tir.Schedule(..., traced=True)
+# do a lot of scheduling
+...
+# print the trace
+print(sch.trace)
+```
+
+And below is an example of the printed trace, which honestly reflects the schedule as a snippet of
+python scheduling function:
+
+```python
+b0 = sch.get_block(name="matmul", func_name="main")
+l1, l2, l3 = sch.get_loops(block=b0)
+v4, v5, v6, v7 = sch.sample_perfect_tile(loop=l1, n=4, max_innermost_factor=16, decision=[32, 1, 16, 2])
+v8, v9, v10, v11 = sch.sample_perfect_tile(loop=l2, n=4, max_innermost_factor=16, decision=[64, 4, 2, 2])
+v12, v13 = sch.sample_perfect_tile(loop=l3, n=2, max_innermost_factor=16, decision=[64, 16])
+l14, l15, l16, l17 = sch.split(loop=l1, factors=[v4, v5, v6, v7])
+l18, l19, l20, l21 = sch.split(loop=l2, factors=[v8, v9, v10, v11])
+l22, l23 = sch.split(loop=l3, factors=[v12, v13])
+sch.reorder(l14, l18, l15, l19, l22, l16, l20, l23, l17, l21)
+```
+
+### 4.2. Exploring the Design Space
+
+Meta Schedule provides several built-in exploration strategies to exhaustively or efficiently search
+for efficient schedules.
+
+**Random search by replaying schedule functions.** With a user-provided schedule function
+as a black-box design space generator, our system could repetitively invoke such an opaque function
+without doing any extra analysis. The function could be written in C++ or Python, or any language
+that implements packed function FFI. If sampling instructions are present in the function, then each

Review comment:
       by "function" do you mean "trace"?

##########
File path: rfcs/0001-meta-schedule-autotensorir.md
##########
@@ -0,0 +1,444 @@
+<!--- Licensed to the Apache Software Foundation (ASF) under one -->
+<!--- or more contributor license agreements.  See the NOTICE file -->
+<!--- distributed with this work for additional information -->
+<!--- regarding copyright ownership.  The ASF licenses this file -->
+<!--- to you under the Apache License, Version 2.0 (the -->
+<!--- "License"); you may not use this file except in compliance -->
+<!--- with the License.  You may obtain a copy of the License at -->
+
+<!---   http://www.apache.org/licenses/LICENSE-2.0 -->
+
+<!--- Unless required by applicable law or agreed to in writing, -->
+<!--- software distributed under the License is distributed on an -->
+<!--- "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY -->
+<!--- KIND, either express or implied.  See the License for the -->
+<!--- specific language governing permissions and limitations -->
+<!--- under the License. -->
+
+* Feature Name: Meta Schedule (Formerly AutoTIR)
+* Start Date: 2021-05-28
+* RFC PR: TBD (apache/tvm-rfcs#0000)
+* GitHub Issue: TBD (apache/tvm-rfcs#0000)
+
+## 1. Summary
+
+This proposal introduces Meta Schedule: a probabilistic scheduling DSL on TIR that unifies the
+approaches of AutoTVM and Auto Scheduler (Ansor). Meta schedule provides a pragmatic way to define
+the space of automatic tuning, extensibility in terms of all possible TIR schedule primitives like
+tensorization and loop partitioning, and customizability on every layer of the automation system.
+
+Meta Schedule is our 3rd generation automatic scheduling system.
+
+## 2. Motivation
+
+**Scheduling and design space.** In TVM TensorIR, optimization of a TensorIR program is done via a
+sequence of transformations. For example, we reorder loops for better locality and we tensorize for
+specific hardware intrinsics. The process of invoking such a set of pre-defined transformations is
+called "**scheduling**", and each transformation is called a "**schedule primitive**". These
+primitives form a domain-specific language (DSL) describing the transformation of TensorIR programs.
+**Design space** is the set of all possible schedulings with respect to a TensorIR program.
+
+**Problems with the current scheduling system.** Currently we have 3 sets of scheduling APIs:
+* **Manual schedule**: Developers optimize their programs by manually invoking schedule primitives,
+  i.e. explore points in the design space with humans in the loop. This can be a tedious and
+  error-prone approach, hence the creation of AutoTVM and AutoScheduler (Ansor).
+* **AutoTVM**: The automation system requires users to define "schedule templates" as the design
+  space for each operator. Therefore, it is inextensible to hundreds of operators and variety
+  hardware platforms.
+* **AutoScheduler (Ansor)**: It automatically generates schedule templates as the design space,
+  according to a set of predefined "search rules". However, it is non-trivial to extend
+  AutoScheduler to new schedule primitives (tensorize, loop partition, software pipelining, etc).
+* The three systems above have isolated sets of APIs with several layers of their own abstraction,
+  which are not only hard to learn, but also engineering-intensive to customize.
+
+**Benefits of Meta Schedule.**  Meta schedule provides:
+* Succinct syntax, consistent APIs to TensorIR schedule with no other layer of abstraction.
+* Unified APIs for implementing manual schedules, AutoTVM-style schedules, and AutoScheduler-style
+  schedules.
+* Extensibility of all the schedule primitives, including tensorization and loop partitioning.
+  Almost no extra effort is needed to use a new primitive in auto-tuning.
+* The automation infrastructure is extensible on every of its components. Every component of the
+  system can be customized easily in pure python or C++ or both. For example, one can develop a new
+  design space generator in python, a new ProgramRunner in python, etc.
+
+
+## 3. Guide-level explanation
+
+In this section, we describe the syntax of meta schedule DSL, and how it could be used to describe
+and auto-generate the design space.
+
+### 3.1. Manual Schedule
+
+Meta schedule APIs are almost the same as TE or TensorIR scheduling. Here is an example of a manual
+schedule for matrix multiplication:
+
+```python
+# Designate a set of tile sizes
+i_tiles = [16, 8, 8, 8]
+j_tiles = [16, 8, 8, 8]
+k_tiles = [256, 8]
+
+# Tile the loops according to the tile sizes
+i_0, i_1, i_2, i_3 = sch.split(loop=i, factors=i_tiles)
+j_0, j_1, j_2, j_3 = sch.split(loop=j, factors=j_tiles)
+k_0, k_1           = sch.split(loop=k, factors=k_tiles)
+
+# Organize the loops into "SSRSRS" 6-level tiles
+sch.reorder(
+    i_0, j_0, # S: the 1st spatial tile
+    i_1, j_1, # S: the 2nd spatial tile
+    k_0,      # R: the 1st reduction tile
+    i_2, j_2, # S: the 3rd spatial tile
+    k_1,      # R: the 2nd reduction tile
+    i_3, j_3, # S: the 4th spatial tile
+)
+```
+
+In this example, the developers may tweak the tile sizes and measure the performance of the
+generated kernels to explore the opportunities of potential optimization.
+
+Generally speaking, while writing a schedule, there are often some parameters that are hard to
+determine ahead of time, for example, tile sizes, unroll steps, or which tensor intrinsics to use.
+Developers may manually enumerate possible combinations of these unknown factors, and then pick the
+best schedule according to measurement results on their device.
+
+### 3.2. Composite Schedule Rules
+
+As introduced in the previous section, in TensorIR, each schedule primitive handles only a very
+basic transformation of the IR. For example, `split` only splits a loop into two new loops. In the
+real world, the over-fine granularity of those primitives usually leads to repetitive and verbose
+scheduling code, as
+[mentioned](https://discuss.tvm.apache.org/t/rfc-tensorir-a-schedulable-ir-for-tvm/7872/43?u=junrushao1994)
+by developers in our community.
+
+To make it more convenient and modular, we allow users to register "composite schedules" that apply
+a sequence of schedule primitives according to certain analysis of the IR. For instance, a composite
+schedule may inspect a TensorIR block and decide whether we should call `compute_inline` on it.
+
+### 3.3. AutoTVM-style Design Space Description
+
+Meta schedule extends the schedule DSL with sampling instructions. When included in a schedule,
+these instructions parametrize the schedule from a single deterministic point to a space supported
+by random variables (tile size, etc.), making it possible for developers to describe the design
+space with meta schedule APIs.
+
+We can extend the matmul example above to cover all possible tilings using these sampling
+instructions:
+
+```python
+# Sample tile sizes
+i_tiles = sch.sample_perfect_tile(i, n=4)
+j_tiles = sch.sample_perfect_tile(j, n=4)
+k_tiles = sch.sample_perfect_tile(k, n=2)
+# Tile the loops according to the random variables
+i_0, i_1, i_2, i_3 = sch.split(loop=i, factors=i_tiles)
+j_0, j_1, j_2, j_3 = sch.split(loop=j, factors=j_tiles)
+k_0, k_1           = sch.split(loop=k, factors=k_tiles)
+# Organize the loops into "SSRSRS" 6-level tiles
+sch.reorder(
+    i_0, j_0, # S: the 1st spatial tile
+    i_1, j_1, # S: the 2nd spatial tile
+    k_0,      # R: the 1st reduction tile
+    i_2, j_2, # S: the 3rd spatial tile
+    k_1,      # R: the 2nd reduction tile
+    i_3, j_3, # S: the 4th spatial tile
+)
+```
+
+### 3.4. AutoScheduler-style Design Space Generation
+
+AutoScheduler (Ansor) generates schedule templates by applying their SearchRules to each stage.
+SearchRule analyzes TE and eagerly trigger schedule primitives accordingly in its internally
+maintained mini IR.
+
+As introduced in Section 3.2, composite schedule rules are equivalent to AutoScheduler's SearchRule
+in TensorIR scheduling. To further generate a design space for scheduling, sampling instructions are
+used in composite schedule rules. Similarly, the sketch generation phase in AutoScheduler is
+equivalent to applying composite schedule rules to each block in TensorIR.
+
+Several built-in composite schedule rules are shipped with our system to align with Ansor's design
+space:
+
+* Multi-level tiling
+* Inline pure spatial blocks
+* Parallelize & vectorize & unroll
+* Auto tensorize
+
+Developers may implement their own rules in either Python or C++. They may specify which rules to
+use with the following syntax:
+
+```python
+from tvm import meta_schedule as ms
+
+design_space_generator = ms.PostOrderApply(rules=[
+    ms.MultiLevelTiling(...),
+    CustomRule(...),
+    ms.OtherBuiltinRules(...),
+])
+
+```
+
+### 3.5. Unifying manual schedule / AutoTVM / Ansor
+
+In this section, we show that the design space induced by TE manual schedule, AutoTVM and Ansor are
+all subsets of meta schedule, and meta schedule further allows mixing those three styles to search
+jointly.
+
+**Manual schedule**. The TE schedule is a special case of a meta schedule program, where there is no
+randomness introduced by sampling instructions. It is a single point in terms of design space.
+
+**AutoTVM (Template-based tuning)**. It is more natural representation of AutoTVM’s schedule
+templates (knobs) by writing one or more schedule functions in meta schedule with sampling
+instructions. The execution traces generated by the schedule functions are the design space to be
+explored.
+
+**AutoScheduler (Ansor, Template-free tuning)**. As mentioned in the previous section, application
+of composite schedule rules generates the design space, which is equivalent to Ansor’s sketch
+generation.
+
+**Mixing styles in design space definition**. By taking union of the spaces induced by the three
+special cases, our system allows developers to combine generic rules that Ansor provides and
+operator-specific scheduling.
+
+## 4. Reference-level explanation
+
+In this section, we introduce the underlying techniques for the automation system to extract and
+explore the design space. The figure below briefly illustrates the workflow of the system:
+
+![meta-schedule-workflow](../resources/meta-schedule-workflow.png)
+
+### 4.1. Execution trace as the design space
+
+**Trace**. To represent the design space defined by the meta schedule DSL, the underlying system
+records all the instructions users applied to the schedule class, including sampling and schedule
+primitives. We call this list of instructions a trace.
+
+For instance, executing the example above results in the following trace:
+
+```
+Instruction 0. Sample-Perfect-Tile
+Instruction 1. Sample-Perfect-Tile
+Instruction 2. Sample-Perfect-Tile
+Instruction 3. Split
+Instruction 4. Split
+Instruction 5. Split
+Instruction 6. Reorder
+```
+
+**Trace forms design space.** A trace may contain zero or more sampling instructions, which
+introduces the uncertainty in scheduling - one instance of sampling results in one point in the
+design space. Therefore, the trace itself forms a design space to explore, e.g. which set of tile
+sizes works best on a specific hardware.
+
+**Union of design space**. Our system works on a set of traces, representing the union of the design
+spaces represented by every single trace.
+
+**Fork a trace**. When two different decisions in the scheduling process are equally important to
+generate high-performance schedules, we allow forking the trace into two, and the design space is
+the union of the forked traces.
+
+The trace is not strictly user-facing, but can be accessed and printed with the following syntax:
+
+```python
+# requires to trace the execution
+sch = tir.Schedule(..., traced=True)
+# do a lot of scheduling
+...
+# print the trace
+print(sch.trace)
+```
+
+And below is an example of the printed trace, which honestly reflects the schedule as a snippet of
+python scheduling function:
+
+```python
+b0 = sch.get_block(name="matmul", func_name="main")
+l1, l2, l3 = sch.get_loops(block=b0)
+v4, v5, v6, v7 = sch.sample_perfect_tile(loop=l1, n=4, max_innermost_factor=16, decision=[32, 1, 16, 2])
+v8, v9, v10, v11 = sch.sample_perfect_tile(loop=l2, n=4, max_innermost_factor=16, decision=[64, 4, 2, 2])
+v12, v13 = sch.sample_perfect_tile(loop=l3, n=2, max_innermost_factor=16, decision=[64, 16])
+l14, l15, l16, l17 = sch.split(loop=l1, factors=[v4, v5, v6, v7])
+l18, l19, l20, l21 = sch.split(loop=l2, factors=[v8, v9, v10, v11])
+l22, l23 = sch.split(loop=l3, factors=[v12, v13])
+sch.reorder(l14, l18, l15, l19, l22, l16, l20, l23, l17, l21)
+```
+
+### 4.2. Exploring the Design Space
+
+Meta Schedule provides several built-in exploration strategies to exhaustively or efficiently search
+for efficient schedules.
+
+**Random search by replaying schedule functions.** With a user-provided schedule function
+as a black-box design space generator, our system could repetitively invoke such an opaque function
+without doing any extra analysis. The function could be written in C++ or Python, or any language

Review comment:
       it's probably not necessary to state the language if you just document the interface here, showing it is PackedFunc

##########
File path: rfcs/0001-meta-schedule-autotensorir.md
##########
@@ -0,0 +1,444 @@
+<!--- Licensed to the Apache Software Foundation (ASF) under one -->
+<!--- or more contributor license agreements.  See the NOTICE file -->
+<!--- distributed with this work for additional information -->
+<!--- regarding copyright ownership.  The ASF licenses this file -->
+<!--- to you under the Apache License, Version 2.0 (the -->
+<!--- "License"); you may not use this file except in compliance -->
+<!--- with the License.  You may obtain a copy of the License at -->
+
+<!---   http://www.apache.org/licenses/LICENSE-2.0 -->
+
+<!--- Unless required by applicable law or agreed to in writing, -->
+<!--- software distributed under the License is distributed on an -->
+<!--- "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY -->
+<!--- KIND, either express or implied.  See the License for the -->
+<!--- specific language governing permissions and limitations -->
+<!--- under the License. -->
+
+* Feature Name: Meta Schedule (Formerly AutoTIR)
+* Start Date: 2021-05-28
+* RFC PR: TBD (apache/tvm-rfcs#0000)
+* GitHub Issue: TBD (apache/tvm-rfcs#0000)
+
+## 1. Summary
+
+This proposal introduces Meta Schedule: a probabilistic scheduling DSL on TIR that unifies the
+approaches of AutoTVM and Auto Scheduler (Ansor). Meta schedule provides a pragmatic way to define
+the space of automatic tuning, extensibility in terms of all possible TIR schedule primitives like
+tensorization and loop partitioning, and customizability on every layer of the automation system.
+
+Meta Schedule is our 3rd generation automatic scheduling system.
+
+## 2. Motivation
+
+**Scheduling and design space.** In TVM TensorIR, optimization of a TensorIR program is done via a
+sequence of transformations. For example, we reorder loops for better locality and we tensorize for
+specific hardware intrinsics. The process of invoking such a set of pre-defined transformations is
+called "**scheduling**", and each transformation is called a "**schedule primitive**". These
+primitives form a domain-specific language (DSL) describing the transformation of TensorIR programs.
+**Design space** is the set of all possible schedulings with respect to a TensorIR program.
+
+**Problems with the current scheduling system.** Currently we have 3 sets of scheduling APIs:
+* **Manual schedule**: Developers optimize their programs by manually invoking schedule primitives,
+  i.e. explore points in the design space with humans in the loop. This can be a tedious and
+  error-prone approach, hence the creation of AutoTVM and AutoScheduler (Ansor).
+* **AutoTVM**: The automation system requires users to define "schedule templates" as the design
+  space for each operator. Therefore, it is inextensible to hundreds of operators and variety
+  hardware platforms.
+* **AutoScheduler (Ansor)**: It automatically generates schedule templates as the design space,
+  according to a set of predefined "search rules". However, it is non-trivial to extend
+  AutoScheduler to new schedule primitives (tensorize, loop partition, software pipelining, etc).
+* The three systems above have isolated sets of APIs with several layers of their own abstraction,
+  which are not only hard to learn, but also engineering-intensive to customize.
+
+**Benefits of Meta Schedule.**  Meta schedule provides:
+* Succinct syntax, consistent APIs to TensorIR schedule with no other layer of abstraction.
+* Unified APIs for implementing manual schedules, AutoTVM-style schedules, and AutoScheduler-style
+  schedules.
+* Extensibility of all the schedule primitives, including tensorization and loop partitioning.
+  Almost no extra effort is needed to use a new primitive in auto-tuning.
+* The automation infrastructure is extensible on every of its components. Every component of the
+  system can be customized easily in pure python or C++ or both. For example, one can develop a new
+  design space generator in python, a new ProgramRunner in python, etc.
+
+
+## 3. Guide-level explanation
+
+In this section, we describe the syntax of meta schedule DSL, and how it could be used to describe
+and auto-generate the design space.
+
+### 3.1. Manual Schedule
+
+Meta schedule APIs are almost the same as TE or TensorIR scheduling. Here is an example of a manual
+schedule for matrix multiplication:
+
+```python
+# Designate a set of tile sizes
+i_tiles = [16, 8, 8, 8]
+j_tiles = [16, 8, 8, 8]
+k_tiles = [256, 8]
+
+# Tile the loops according to the tile sizes
+i_0, i_1, i_2, i_3 = sch.split(loop=i, factors=i_tiles)
+j_0, j_1, j_2, j_3 = sch.split(loop=j, factors=j_tiles)
+k_0, k_1           = sch.split(loop=k, factors=k_tiles)
+
+# Organize the loops into "SSRSRS" 6-level tiles
+sch.reorder(
+    i_0, j_0, # S: the 1st spatial tile
+    i_1, j_1, # S: the 2nd spatial tile
+    k_0,      # R: the 1st reduction tile
+    i_2, j_2, # S: the 3rd spatial tile
+    k_1,      # R: the 2nd reduction tile
+    i_3, j_3, # S: the 4th spatial tile
+)
+```
+
+In this example, the developers may tweak the tile sizes and measure the performance of the
+generated kernels to explore the opportunities of potential optimization.
+
+Generally speaking, while writing a schedule, there are often some parameters that are hard to
+determine ahead of time, for example, tile sizes, unroll steps, or which tensor intrinsics to use.
+Developers may manually enumerate possible combinations of these unknown factors, and then pick the
+best schedule according to measurement results on their device.
+
+### 3.2. Composite Schedule Rules
+
+As introduced in the previous section, in TensorIR, each schedule primitive handles only a very
+basic transformation of the IR. For example, `split` only splits a loop into two new loops. In the
+real world, the over-fine granularity of those primitives usually leads to repetitive and verbose
+scheduling code, as
+[mentioned](https://discuss.tvm.apache.org/t/rfc-tensorir-a-schedulable-ir-for-tvm/7872/43?u=junrushao1994)
+by developers in our community.
+
+To make it more convenient and modular, we allow users to register "composite schedules" that apply
+a sequence of schedule primitives according to certain analysis of the IR. For instance, a composite
+schedule may inspect a TensorIR block and decide whether we should call `compute_inline` on it.
+
+### 3.3. AutoTVM-style Design Space Description
+
+Meta schedule extends the schedule DSL with sampling instructions. When included in a schedule,
+these instructions parametrize the schedule from a single deterministic point to a space supported
+by random variables (tile size, etc.), making it possible for developers to describe the design
+space with meta schedule APIs.
+
+We can extend the matmul example above to cover all possible tilings using these sampling
+instructions:
+
+```python
+# Sample tile sizes
+i_tiles = sch.sample_perfect_tile(i, n=4)
+j_tiles = sch.sample_perfect_tile(j, n=4)
+k_tiles = sch.sample_perfect_tile(k, n=2)
+# Tile the loops according to the random variables
+i_0, i_1, i_2, i_3 = sch.split(loop=i, factors=i_tiles)
+j_0, j_1, j_2, j_3 = sch.split(loop=j, factors=j_tiles)
+k_0, k_1           = sch.split(loop=k, factors=k_tiles)
+# Organize the loops into "SSRSRS" 6-level tiles
+sch.reorder(
+    i_0, j_0, # S: the 1st spatial tile
+    i_1, j_1, # S: the 2nd spatial tile
+    k_0,      # R: the 1st reduction tile
+    i_2, j_2, # S: the 3rd spatial tile
+    k_1,      # R: the 2nd reduction tile
+    i_3, j_3, # S: the 4th spatial tile
+)
+```
+
+### 3.4. AutoScheduler-style Design Space Generation
+
+AutoScheduler (Ansor) generates schedule templates by applying their SearchRules to each stage.
+SearchRule analyzes TE and eagerly trigger schedule primitives accordingly in its internally
+maintained mini IR.
+
+As introduced in Section 3.2, composite schedule rules are equivalent to AutoScheduler's SearchRule
+in TensorIR scheduling. To further generate a design space for scheduling, sampling instructions are
+used in composite schedule rules. Similarly, the sketch generation phase in AutoScheduler is
+equivalent to applying composite schedule rules to each block in TensorIR.
+
+Several built-in composite schedule rules are shipped with our system to align with Ansor's design
+space:
+
+* Multi-level tiling
+* Inline pure spatial blocks
+* Parallelize & vectorize & unroll
+* Auto tensorize
+
+Developers may implement their own rules in either Python or C++. They may specify which rules to
+use with the following syntax:
+
+```python
+from tvm import meta_schedule as ms
+
+design_space_generator = ms.PostOrderApply(rules=[
+    ms.MultiLevelTiling(...),
+    CustomRule(...),
+    ms.OtherBuiltinRules(...),
+])
+
+```
+
+### 3.5. Unifying manual schedule / AutoTVM / Ansor
+
+In this section, we show that the design space induced by TE manual schedule, AutoTVM and Ansor are
+all subsets of meta schedule, and meta schedule further allows mixing those three styles to search
+jointly.
+
+**Manual schedule**. The TE schedule is a special case of a meta schedule program, where there is no
+randomness introduced by sampling instructions. It is a single point in terms of design space.
+
+**AutoTVM (Template-based tuning)**. It is more natural representation of AutoTVM’s schedule
+templates (knobs) by writing one or more schedule functions in meta schedule with sampling
+instructions. The execution traces generated by the schedule functions are the design space to be
+explored.
+
+**AutoScheduler (Ansor, Template-free tuning)**. As mentioned in the previous section, application
+of composite schedule rules generates the design space, which is equivalent to Ansor’s sketch
+generation.
+
+**Mixing styles in design space definition**. By taking union of the spaces induced by the three

Review comment:
       what is the process by which Meta Schedule takes the union of the spaces? it's conceptually simple to understand the idea, but since this is a Guide-level explanation of an implementation, more explanation should be given. in particular, i think the overarching "search" process hasn't been described in terms of steps:
   
   1. start with a Relay operator
   2. obtain TE (how?)
   3. propose a bunch of candidate TensorIR (there are several ways given here to do this, but the "glue" that bind them into an overall process is not explained)
   4. time and evaluate (mostly well-understood)
   5. somehow feed this back into a search process (not explained)
   
   i'm not sure this RFC is exactly about the auto scheduler process, but it would be helpful to add here as background, particularly since you use the word "sample" a lot.

##########
File path: rfcs/0001-meta-schedule-autotensorir.md
##########
@@ -0,0 +1,444 @@
+<!--- Licensed to the Apache Software Foundation (ASF) under one -->
+<!--- or more contributor license agreements.  See the NOTICE file -->
+<!--- distributed with this work for additional information -->
+<!--- regarding copyright ownership.  The ASF licenses this file -->
+<!--- to you under the Apache License, Version 2.0 (the -->
+<!--- "License"); you may not use this file except in compliance -->
+<!--- with the License.  You may obtain a copy of the License at -->
+
+<!---   http://www.apache.org/licenses/LICENSE-2.0 -->
+
+<!--- Unless required by applicable law or agreed to in writing, -->
+<!--- software distributed under the License is distributed on an -->
+<!--- "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY -->
+<!--- KIND, either express or implied.  See the License for the -->
+<!--- specific language governing permissions and limitations -->
+<!--- under the License. -->
+
+* Feature Name: Meta Schedule (Formerly AutoTIR)
+* Start Date: 2021-05-28
+* RFC PR: TBD (apache/tvm-rfcs#0000)
+* GitHub Issue: TBD (apache/tvm-rfcs#0000)
+
+## 1. Summary
+
+This proposal introduces Meta Schedule: a probabilistic scheduling DSL on TIR that unifies the
+approaches of AutoTVM and Auto Scheduler (Ansor). Meta schedule provides a pragmatic way to define
+the space of automatic tuning, extensibility in terms of all possible TIR schedule primitives like
+tensorization and loop partitioning, and customizability on every layer of the automation system.
+
+Meta Schedule is our 3rd generation automatic scheduling system.
+
+## 2. Motivation
+
+**Scheduling and design space.** In TVM TensorIR, optimization of a TensorIR program is done via a
+sequence of transformations. For example, we reorder loops for better locality and we tensorize for
+specific hardware intrinsics. The process of invoking such a set of pre-defined transformations is
+called "**scheduling**", and each transformation is called a "**schedule primitive**". These
+primitives form a domain-specific language (DSL) describing the transformation of TensorIR programs.
+**Design space** is the set of all possible schedulings with respect to a TensorIR program.
+
+**Problems with the current scheduling system.** Currently we have 3 sets of scheduling APIs:
+* **Manual schedule**: Developers optimize their programs by manually invoking schedule primitives,
+  i.e. explore points in the design space with humans in the loop. This can be a tedious and
+  error-prone approach, hence the creation of AutoTVM and AutoScheduler (Ansor).
+* **AutoTVM**: The automation system requires users to define "schedule templates" as the design
+  space for each operator. Therefore, it is inextensible to hundreds of operators and variety
+  hardware platforms.
+* **AutoScheduler (Ansor)**: It automatically generates schedule templates as the design space,
+  according to a set of predefined "search rules". However, it is non-trivial to extend
+  AutoScheduler to new schedule primitives (tensorize, loop partition, software pipelining, etc).
+* The three systems above have isolated sets of APIs with several layers of their own abstraction,
+  which are not only hard to learn, but also engineering-intensive to customize.
+
+**Benefits of Meta Schedule.**  Meta schedule provides:
+* Succinct syntax, consistent APIs to TensorIR schedule with no other layer of abstraction.
+* Unified APIs for implementing manual schedules, AutoTVM-style schedules, and AutoScheduler-style
+  schedules.
+* Extensibility of all the schedule primitives, including tensorization and loop partitioning.
+  Almost no extra effort is needed to use a new primitive in auto-tuning.
+* The automation infrastructure is extensible on every of its components. Every component of the
+  system can be customized easily in pure python or C++ or both. For example, one can develop a new
+  design space generator in python, a new ProgramRunner in python, etc.
+
+
+## 3. Guide-level explanation
+
+In this section, we describe the syntax of meta schedule DSL, and how it could be used to describe
+and auto-generate the design space.
+
+### 3.1. Manual Schedule
+
+Meta schedule APIs are almost the same as TE or TensorIR scheduling. Here is an example of a manual
+schedule for matrix multiplication:
+
+```python
+# Designate a set of tile sizes
+i_tiles = [16, 8, 8, 8]
+j_tiles = [16, 8, 8, 8]
+k_tiles = [256, 8]
+
+# Tile the loops according to the tile sizes
+i_0, i_1, i_2, i_3 = sch.split(loop=i, factors=i_tiles)
+j_0, j_1, j_2, j_3 = sch.split(loop=j, factors=j_tiles)
+k_0, k_1           = sch.split(loop=k, factors=k_tiles)
+
+# Organize the loops into "SSRSRS" 6-level tiles
+sch.reorder(
+    i_0, j_0, # S: the 1st spatial tile
+    i_1, j_1, # S: the 2nd spatial tile
+    k_0,      # R: the 1st reduction tile
+    i_2, j_2, # S: the 3rd spatial tile
+    k_1,      # R: the 2nd reduction tile
+    i_3, j_3, # S: the 4th spatial tile
+)
+```
+
+In this example, the developers may tweak the tile sizes and measure the performance of the
+generated kernels to explore the opportunities of potential optimization.
+
+Generally speaking, while writing a schedule, there are often some parameters that are hard to
+determine ahead of time, for example, tile sizes, unroll steps, or which tensor intrinsics to use.
+Developers may manually enumerate possible combinations of these unknown factors, and then pick the
+best schedule according to measurement results on their device.
+
+### 3.2. Composite Schedule Rules
+
+As introduced in the previous section, in TensorIR, each schedule primitive handles only a very
+basic transformation of the IR. For example, `split` only splits a loop into two new loops. In the
+real world, the over-fine granularity of those primitives usually leads to repetitive and verbose
+scheduling code, as
+[mentioned](https://discuss.tvm.apache.org/t/rfc-tensorir-a-schedulable-ir-for-tvm/7872/43?u=junrushao1994)
+by developers in our community.
+
+To make it more convenient and modular, we allow users to register "composite schedules" that apply
+a sequence of schedule primitives according to certain analysis of the IR. For instance, a composite
+schedule may inspect a TensorIR block and decide whether we should call `compute_inline` on it.
+
+### 3.3. AutoTVM-style Design Space Description
+
+Meta schedule extends the schedule DSL with sampling instructions. When included in a schedule,
+these instructions parametrize the schedule from a single deterministic point to a space supported
+by random variables (tile size, etc.), making it possible for developers to describe the design
+space with meta schedule APIs.
+
+We can extend the matmul example above to cover all possible tilings using these sampling
+instructions:
+
+```python
+# Sample tile sizes
+i_tiles = sch.sample_perfect_tile(i, n=4)
+j_tiles = sch.sample_perfect_tile(j, n=4)
+k_tiles = sch.sample_perfect_tile(k, n=2)
+# Tile the loops according to the random variables
+i_0, i_1, i_2, i_3 = sch.split(loop=i, factors=i_tiles)
+j_0, j_1, j_2, j_3 = sch.split(loop=j, factors=j_tiles)
+k_0, k_1           = sch.split(loop=k, factors=k_tiles)
+# Organize the loops into "SSRSRS" 6-level tiles
+sch.reorder(
+    i_0, j_0, # S: the 1st spatial tile
+    i_1, j_1, # S: the 2nd spatial tile
+    k_0,      # R: the 1st reduction tile
+    i_2, j_2, # S: the 3rd spatial tile
+    k_1,      # R: the 2nd reduction tile
+    i_3, j_3, # S: the 4th spatial tile
+)
+```
+
+### 3.4. AutoScheduler-style Design Space Generation
+
+AutoScheduler (Ansor) generates schedule templates by applying their SearchRules to each stage.
+SearchRule analyzes TE and eagerly trigger schedule primitives accordingly in its internally
+maintained mini IR.
+
+As introduced in Section 3.2, composite schedule rules are equivalent to AutoScheduler's SearchRule
+in TensorIR scheduling. To further generate a design space for scheduling, sampling instructions are
+used in composite schedule rules. Similarly, the sketch generation phase in AutoScheduler is
+equivalent to applying composite schedule rules to each block in TensorIR.
+
+Several built-in composite schedule rules are shipped with our system to align with Ansor's design
+space:
+
+* Multi-level tiling
+* Inline pure spatial blocks
+* Parallelize & vectorize & unroll
+* Auto tensorize
+
+Developers may implement their own rules in either Python or C++. They may specify which rules to
+use with the following syntax:
+
+```python
+from tvm import meta_schedule as ms
+
+design_space_generator = ms.PostOrderApply(rules=[
+    ms.MultiLevelTiling(...),
+    CustomRule(...),
+    ms.OtherBuiltinRules(...),
+])
+
+```
+
+### 3.5. Unifying manual schedule / AutoTVM / Ansor
+
+In this section, we show that the design space induced by TE manual schedule, AutoTVM and Ansor are
+all subsets of meta schedule, and meta schedule further allows mixing those three styles to search
+jointly.
+
+**Manual schedule**. The TE schedule is a special case of a meta schedule program, where there is no
+randomness introduced by sampling instructions. It is a single point in terms of design space.
+
+**AutoTVM (Template-based tuning)**. It is more natural representation of AutoTVM’s schedule
+templates (knobs) by writing one or more schedule functions in meta schedule with sampling
+instructions. The execution traces generated by the schedule functions are the design space to be
+explored.
+
+**AutoScheduler (Ansor, Template-free tuning)**. As mentioned in the previous section, application
+of composite schedule rules generates the design space, which is equivalent to Ansor’s sketch
+generation.
+
+**Mixing styles in design space definition**. By taking union of the spaces induced by the three
+special cases, our system allows developers to combine generic rules that Ansor provides and
+operator-specific scheduling.
+
+## 4. Reference-level explanation
+
+In this section, we introduce the underlying techniques for the automation system to extract and
+explore the design space. The figure below briefly illustrates the workflow of the system:
+
+![meta-schedule-workflow](../resources/meta-schedule-workflow.png)
+
+### 4.1. Execution trace as the design space

Review comment:
       i think you could introduce this concept in Guide-level explanation above. the ways in which it are used (union, fork) might be better left here.

##########
File path: rfcs/0001-meta-schedule-autotensorir.md
##########
@@ -0,0 +1,444 @@
+<!--- Licensed to the Apache Software Foundation (ASF) under one -->
+<!--- or more contributor license agreements.  See the NOTICE file -->
+<!--- distributed with this work for additional information -->
+<!--- regarding copyright ownership.  The ASF licenses this file -->
+<!--- to you under the Apache License, Version 2.0 (the -->
+<!--- "License"); you may not use this file except in compliance -->
+<!--- with the License.  You may obtain a copy of the License at -->
+
+<!---   http://www.apache.org/licenses/LICENSE-2.0 -->
+
+<!--- Unless required by applicable law or agreed to in writing, -->
+<!--- software distributed under the License is distributed on an -->
+<!--- "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY -->
+<!--- KIND, either express or implied.  See the License for the -->
+<!--- specific language governing permissions and limitations -->
+<!--- under the License. -->
+
+* Feature Name: Meta Schedule (Formerly AutoTIR)
+* Start Date: 2021-05-28
+* RFC PR: TBD (apache/tvm-rfcs#0000)
+* GitHub Issue: TBD (apache/tvm-rfcs#0000)
+
+## 1. Summary
+
+This proposal introduces Meta Schedule: a probabilistic scheduling DSL on TIR that unifies the
+approaches of AutoTVM and Auto Scheduler (Ansor). Meta schedule provides a pragmatic way to define
+the space of automatic tuning, extensibility in terms of all possible TIR schedule primitives like
+tensorization and loop partitioning, and customizability on every layer of the automation system.
+
+Meta Schedule is our 3rd generation automatic scheduling system.
+
+## 2. Motivation
+
+**Scheduling and design space.** In TVM TensorIR, optimization of a TensorIR program is done via a
+sequence of transformations. For example, we reorder loops for better locality and we tensorize for
+specific hardware intrinsics. The process of invoking such a set of pre-defined transformations is
+called "**scheduling**", and each transformation is called a "**schedule primitive**". These
+primitives form a domain-specific language (DSL) describing the transformation of TensorIR programs.
+**Design space** is the set of all possible schedulings with respect to a TensorIR program.
+
+**Problems with the current scheduling system.** Currently we have 3 sets of scheduling APIs:
+* **Manual schedule**: Developers optimize their programs by manually invoking schedule primitives,
+  i.e. explore points in the design space with humans in the loop. This can be a tedious and
+  error-prone approach, hence the creation of AutoTVM and AutoScheduler (Ansor).
+* **AutoTVM**: The automation system requires users to define "schedule templates" as the design
+  space for each operator. Therefore, it is inextensible to hundreds of operators and variety
+  hardware platforms.
+* **AutoScheduler (Ansor)**: It automatically generates schedule templates as the design space,
+  according to a set of predefined "search rules". However, it is non-trivial to extend
+  AutoScheduler to new schedule primitives (tensorize, loop partition, software pipelining, etc).
+* The three systems above have isolated sets of APIs with several layers of their own abstraction,
+  which are not only hard to learn, but also engineering-intensive to customize.
+
+**Benefits of Meta Schedule.**  Meta schedule provides:
+* Succinct syntax, consistent APIs to TensorIR schedule with no other layer of abstraction.
+* Unified APIs for implementing manual schedules, AutoTVM-style schedules, and AutoScheduler-style
+  schedules.
+* Extensibility of all the schedule primitives, including tensorization and loop partitioning.
+  Almost no extra effort is needed to use a new primitive in auto-tuning.
+* The automation infrastructure is extensible on every of its components. Every component of the
+  system can be customized easily in pure python or C++ or both. For example, one can develop a new
+  design space generator in python, a new ProgramRunner in python, etc.
+
+
+## 3. Guide-level explanation
+
+In this section, we describe the syntax of meta schedule DSL, and how it could be used to describe
+and auto-generate the design space.
+
+### 3.1. Manual Schedule
+
+Meta schedule APIs are almost the same as TE or TensorIR scheduling. Here is an example of a manual
+schedule for matrix multiplication:
+
+```python
+# Designate a set of tile sizes
+i_tiles = [16, 8, 8, 8]
+j_tiles = [16, 8, 8, 8]
+k_tiles = [256, 8]
+
+# Tile the loops according to the tile sizes
+i_0, i_1, i_2, i_3 = sch.split(loop=i, factors=i_tiles)
+j_0, j_1, j_2, j_3 = sch.split(loop=j, factors=j_tiles)
+k_0, k_1           = sch.split(loop=k, factors=k_tiles)
+
+# Organize the loops into "SSRSRS" 6-level tiles
+sch.reorder(
+    i_0, j_0, # S: the 1st spatial tile
+    i_1, j_1, # S: the 2nd spatial tile
+    k_0,      # R: the 1st reduction tile
+    i_2, j_2, # S: the 3rd spatial tile
+    k_1,      # R: the 2nd reduction tile
+    i_3, j_3, # S: the 4th spatial tile
+)
+```
+
+In this example, the developers may tweak the tile sizes and measure the performance of the
+generated kernels to explore the opportunities of potential optimization.
+
+Generally speaking, while writing a schedule, there are often some parameters that are hard to
+determine ahead of time, for example, tile sizes, unroll steps, or which tensor intrinsics to use.
+Developers may manually enumerate possible combinations of these unknown factors, and then pick the
+best schedule according to measurement results on their device.
+
+### 3.2. Composite Schedule Rules
+
+As introduced in the previous section, in TensorIR, each schedule primitive handles only a very
+basic transformation of the IR. For example, `split` only splits a loop into two new loops. In the
+real world, the over-fine granularity of those primitives usually leads to repetitive and verbose
+scheduling code, as
+[mentioned](https://discuss.tvm.apache.org/t/rfc-tensorir-a-schedulable-ir-for-tvm/7872/43?u=junrushao1994)
+by developers in our community.
+
+To make it more convenient and modular, we allow users to register "composite schedules" that apply
+a sequence of schedule primitives according to certain analysis of the IR. For instance, a composite
+schedule may inspect a TensorIR block and decide whether we should call `compute_inline` on it.
+
+### 3.3. AutoTVM-style Design Space Description
+
+Meta schedule extends the schedule DSL with sampling instructions. When included in a schedule,
+these instructions parametrize the schedule from a single deterministic point to a space supported
+by random variables (tile size, etc.), making it possible for developers to describe the design
+space with meta schedule APIs.
+
+We can extend the matmul example above to cover all possible tilings using these sampling
+instructions:
+
+```python
+# Sample tile sizes
+i_tiles = sch.sample_perfect_tile(i, n=4)
+j_tiles = sch.sample_perfect_tile(j, n=4)
+k_tiles = sch.sample_perfect_tile(k, n=2)
+# Tile the loops according to the random variables
+i_0, i_1, i_2, i_3 = sch.split(loop=i, factors=i_tiles)
+j_0, j_1, j_2, j_3 = sch.split(loop=j, factors=j_tiles)
+k_0, k_1           = sch.split(loop=k, factors=k_tiles)
+# Organize the loops into "SSRSRS" 6-level tiles
+sch.reorder(
+    i_0, j_0, # S: the 1st spatial tile
+    i_1, j_1, # S: the 2nd spatial tile
+    k_0,      # R: the 1st reduction tile
+    i_2, j_2, # S: the 3rd spatial tile
+    k_1,      # R: the 2nd reduction tile
+    i_3, j_3, # S: the 4th spatial tile
+)
+```
+
+### 3.4. AutoScheduler-style Design Space Generation
+
+AutoScheduler (Ansor) generates schedule templates by applying their SearchRules to each stage.
+SearchRule analyzes TE and eagerly trigger schedule primitives accordingly in its internally
+maintained mini IR.
+
+As introduced in Section 3.2, composite schedule rules are equivalent to AutoScheduler's SearchRule
+in TensorIR scheduling. To further generate a design space for scheduling, sampling instructions are
+used in composite schedule rules. Similarly, the sketch generation phase in AutoScheduler is
+equivalent to applying composite schedule rules to each block in TensorIR.
+
+Several built-in composite schedule rules are shipped with our system to align with Ansor's design
+space:
+
+* Multi-level tiling
+* Inline pure spatial blocks
+* Parallelize & vectorize & unroll
+* Auto tensorize
+
+Developers may implement their own rules in either Python or C++. They may specify which rules to
+use with the following syntax:
+
+```python
+from tvm import meta_schedule as ms
+
+design_space_generator = ms.PostOrderApply(rules=[
+    ms.MultiLevelTiling(...),
+    CustomRule(...),
+    ms.OtherBuiltinRules(...),
+])
+
+```
+
+### 3.5. Unifying manual schedule / AutoTVM / Ansor
+
+In this section, we show that the design space induced by TE manual schedule, AutoTVM and Ansor are
+all subsets of meta schedule, and meta schedule further allows mixing those three styles to search
+jointly.
+
+**Manual schedule**. The TE schedule is a special case of a meta schedule program, where there is no
+randomness introduced by sampling instructions. It is a single point in terms of design space.
+
+**AutoTVM (Template-based tuning)**. It is more natural representation of AutoTVM’s schedule
+templates (knobs) by writing one or more schedule functions in meta schedule with sampling
+instructions. The execution traces generated by the schedule functions are the design space to be
+explored.
+
+**AutoScheduler (Ansor, Template-free tuning)**. As mentioned in the previous section, application
+of composite schedule rules generates the design space, which is equivalent to Ansor’s sketch
+generation.
+
+**Mixing styles in design space definition**. By taking union of the spaces induced by the three
+special cases, our system allows developers to combine generic rules that Ansor provides and
+operator-specific scheduling.
+
+## 4. Reference-level explanation
+
+In this section, we introduce the underlying techniques for the automation system to extract and
+explore the design space. The figure below briefly illustrates the workflow of the system:
+
+![meta-schedule-workflow](../resources/meta-schedule-workflow.png)
+
+### 4.1. Execution trace as the design space
+
+**Trace**. To represent the design space defined by the meta schedule DSL, the underlying system
+records all the instructions users applied to the schedule class, including sampling and schedule
+primitives. We call this list of instructions a trace.
+
+For instance, executing the example above results in the following trace:
+
+```
+Instruction 0. Sample-Perfect-Tile
+Instruction 1. Sample-Perfect-Tile
+Instruction 2. Sample-Perfect-Tile
+Instruction 3. Split
+Instruction 4. Split
+Instruction 5. Split
+Instruction 6. Reorder
+```
+
+**Trace forms design space.** A trace may contain zero or more sampling instructions, which
+introduces the uncertainty in scheduling - one instance of sampling results in one point in the
+design space. Therefore, the trace itself forms a design space to explore, e.g. which set of tile
+sizes works best on a specific hardware.
+
+**Union of design space**. Our system works on a set of traces, representing the union of the design
+spaces represented by every single trace.
+
+**Fork a trace**. When two different decisions in the scheduling process are equally important to
+generate high-performance schedules, we allow forking the trace into two, and the design space is
+the union of the forked traces.
+
+The trace is not strictly user-facing, but can be accessed and printed with the following syntax:
+
+```python
+# requires to trace the execution
+sch = tir.Schedule(..., traced=True)
+# do a lot of scheduling
+...
+# print the trace
+print(sch.trace)
+```
+
+And below is an example of the printed trace, which honestly reflects the schedule as a snippet of
+python scheduling function:
+
+```python
+b0 = sch.get_block(name="matmul", func_name="main")
+l1, l2, l3 = sch.get_loops(block=b0)
+v4, v5, v6, v7 = sch.sample_perfect_tile(loop=l1, n=4, max_innermost_factor=16, decision=[32, 1, 16, 2])
+v8, v9, v10, v11 = sch.sample_perfect_tile(loop=l2, n=4, max_innermost_factor=16, decision=[64, 4, 2, 2])
+v12, v13 = sch.sample_perfect_tile(loop=l3, n=2, max_innermost_factor=16, decision=[64, 16])
+l14, l15, l16, l17 = sch.split(loop=l1, factors=[v4, v5, v6, v7])
+l18, l19, l20, l21 = sch.split(loop=l2, factors=[v8, v9, v10, v11])
+l22, l23 = sch.split(loop=l3, factors=[v12, v13])
+sch.reorder(l14, l18, l15, l19, l22, l16, l20, l23, l17, l21)
+```
+
+### 4.2. Exploring the Design Space
+
+Meta Schedule provides several built-in exploration strategies to exhaustively or efficiently search
+for efficient schedules.
+
+**Random search by replaying schedule functions.** With a user-provided schedule function
+as a black-box design space generator, our system could repetitively invoke such an opaque function
+without doing any extra analysis. The function could be written in C++ or Python, or any language
+that implements packed function FFI. If sampling instructions are present in the function, then each
+invocation results in a different IRModule after being scheduled because the random decisions are
+possibly changed across different runs. Effectively, it is equivalent to
+random exploration without trace, allowing the flexibility for users to define arbitrary functions
+that trace may not well support (e.g. control flow divergence based on the value of intermediate
+random variables), but it forbids future opportunity of any trace-based analysis.
+
+**Random search by replaying traces.** Traces are obtained from a design space generator, and

Review comment:
       could you be more specific about what the "design space generator" is?

##########
File path: rfcs/0001-meta-schedule-autotensorir.md
##########
@@ -0,0 +1,444 @@
+<!--- Licensed to the Apache Software Foundation (ASF) under one -->
+<!--- or more contributor license agreements.  See the NOTICE file -->
+<!--- distributed with this work for additional information -->
+<!--- regarding copyright ownership.  The ASF licenses this file -->
+<!--- to you under the Apache License, Version 2.0 (the -->
+<!--- "License"); you may not use this file except in compliance -->
+<!--- with the License.  You may obtain a copy of the License at -->
+
+<!---   http://www.apache.org/licenses/LICENSE-2.0 -->
+
+<!--- Unless required by applicable law or agreed to in writing, -->
+<!--- software distributed under the License is distributed on an -->
+<!--- "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY -->
+<!--- KIND, either express or implied.  See the License for the -->
+<!--- specific language governing permissions and limitations -->
+<!--- under the License. -->
+
+* Feature Name: Meta Schedule (Formerly AutoTIR)
+* Start Date: 2021-05-28
+* RFC PR: TBD (apache/tvm-rfcs#0000)
+* GitHub Issue: TBD (apache/tvm-rfcs#0000)
+
+## 1. Summary
+
+This proposal introduces Meta Schedule: a probabilistic scheduling DSL on TIR that unifies the
+approaches of AutoTVM and Auto Scheduler (Ansor). Meta schedule provides a pragmatic way to define
+the space of automatic tuning, extensibility in terms of all possible TIR schedule primitives like
+tensorization and loop partitioning, and customizability on every layer of the automation system.
+
+Meta Schedule is our 3rd generation automatic scheduling system.
+
+## 2. Motivation
+
+**Scheduling and design space.** In TVM TensorIR, optimization of a TensorIR program is done via a
+sequence of transformations. For example, we reorder loops for better locality and we tensorize for
+specific hardware intrinsics. The process of invoking such a set of pre-defined transformations is
+called "**scheduling**", and each transformation is called a "**schedule primitive**". These
+primitives form a domain-specific language (DSL) describing the transformation of TensorIR programs.
+**Design space** is the set of all possible schedulings with respect to a TensorIR program.
+
+**Problems with the current scheduling system.** Currently we have 3 sets of scheduling APIs:
+* **Manual schedule**: Developers optimize their programs by manually invoking schedule primitives,
+  i.e. explore points in the design space with humans in the loop. This can be a tedious and
+  error-prone approach, hence the creation of AutoTVM and AutoScheduler (Ansor).
+* **AutoTVM**: The automation system requires users to define "schedule templates" as the design
+  space for each operator. Therefore, it is inextensible to hundreds of operators and variety
+  hardware platforms.
+* **AutoScheduler (Ansor)**: It automatically generates schedule templates as the design space,
+  according to a set of predefined "search rules". However, it is non-trivial to extend
+  AutoScheduler to new schedule primitives (tensorize, loop partition, software pipelining, etc).
+* The three systems above have isolated sets of APIs with several layers of their own abstraction,
+  which are not only hard to learn, but also engineering-intensive to customize.
+
+**Benefits of Meta Schedule.**  Meta schedule provides:
+* Succinct syntax, consistent APIs to TensorIR schedule with no other layer of abstraction.
+* Unified APIs for implementing manual schedules, AutoTVM-style schedules, and AutoScheduler-style
+  schedules.
+* Extensibility of all the schedule primitives, including tensorization and loop partitioning.
+  Almost no extra effort is needed to use a new primitive in auto-tuning.
+* The automation infrastructure is extensible on every of its components. Every component of the
+  system can be customized easily in pure python or C++ or both. For example, one can develop a new
+  design space generator in python, a new ProgramRunner in python, etc.
+
+
+## 3. Guide-level explanation
+
+In this section, we describe the syntax of meta schedule DSL, and how it could be used to describe
+and auto-generate the design space.
+
+### 3.1. Manual Schedule
+
+Meta schedule APIs are almost the same as TE or TensorIR scheduling. Here is an example of a manual
+schedule for matrix multiplication:
+
+```python
+# Designate a set of tile sizes
+i_tiles = [16, 8, 8, 8]
+j_tiles = [16, 8, 8, 8]
+k_tiles = [256, 8]
+
+# Tile the loops according to the tile sizes
+i_0, i_1, i_2, i_3 = sch.split(loop=i, factors=i_tiles)
+j_0, j_1, j_2, j_3 = sch.split(loop=j, factors=j_tiles)
+k_0, k_1           = sch.split(loop=k, factors=k_tiles)
+
+# Organize the loops into "SSRSRS" 6-level tiles
+sch.reorder(
+    i_0, j_0, # S: the 1st spatial tile
+    i_1, j_1, # S: the 2nd spatial tile
+    k_0,      # R: the 1st reduction tile
+    i_2, j_2, # S: the 3rd spatial tile
+    k_1,      # R: the 2nd reduction tile
+    i_3, j_3, # S: the 4th spatial tile
+)
+```
+
+In this example, the developers may tweak the tile sizes and measure the performance of the
+generated kernels to explore the opportunities of potential optimization.
+
+Generally speaking, while writing a schedule, there are often some parameters that are hard to
+determine ahead of time, for example, tile sizes, unroll steps, or which tensor intrinsics to use.
+Developers may manually enumerate possible combinations of these unknown factors, and then pick the
+best schedule according to measurement results on their device.
+
+### 3.2. Composite Schedule Rules
+
+As introduced in the previous section, in TensorIR, each schedule primitive handles only a very
+basic transformation of the IR. For example, `split` only splits a loop into two new loops. In the
+real world, the over-fine granularity of those primitives usually leads to repetitive and verbose
+scheduling code, as
+[mentioned](https://discuss.tvm.apache.org/t/rfc-tensorir-a-schedulable-ir-for-tvm/7872/43?u=junrushao1994)
+by developers in our community.
+
+To make it more convenient and modular, we allow users to register "composite schedules" that apply
+a sequence of schedule primitives according to certain analysis of the IR. For instance, a composite
+schedule may inspect a TensorIR block and decide whether we should call `compute_inline` on it.
+
+### 3.3. AutoTVM-style Design Space Description
+
+Meta schedule extends the schedule DSL with sampling instructions. When included in a schedule,
+these instructions parametrize the schedule from a single deterministic point to a space supported
+by random variables (tile size, etc.), making it possible for developers to describe the design
+space with meta schedule APIs.
+
+We can extend the matmul example above to cover all possible tilings using these sampling
+instructions:
+
+```python
+# Sample tile sizes
+i_tiles = sch.sample_perfect_tile(i, n=4)
+j_tiles = sch.sample_perfect_tile(j, n=4)
+k_tiles = sch.sample_perfect_tile(k, n=2)
+# Tile the loops according to the random variables
+i_0, i_1, i_2, i_3 = sch.split(loop=i, factors=i_tiles)
+j_0, j_1, j_2, j_3 = sch.split(loop=j, factors=j_tiles)
+k_0, k_1           = sch.split(loop=k, factors=k_tiles)
+# Organize the loops into "SSRSRS" 6-level tiles
+sch.reorder(
+    i_0, j_0, # S: the 1st spatial tile
+    i_1, j_1, # S: the 2nd spatial tile
+    k_0,      # R: the 1st reduction tile
+    i_2, j_2, # S: the 3rd spatial tile
+    k_1,      # R: the 2nd reduction tile
+    i_3, j_3, # S: the 4th spatial tile
+)
+```
+
+### 3.4. AutoScheduler-style Design Space Generation
+
+AutoScheduler (Ansor) generates schedule templates by applying their SearchRules to each stage.
+SearchRule analyzes TE and eagerly trigger schedule primitives accordingly in its internally

Review comment:
       eagerly with respect to what? by "eagerly trigger schedule primitives," it sounds like AutoScheduler is just making a decision without fully understanding the ramifications. could you elaborate/perhaps clarify the wording? i think it might be better written as "AutoScheduler (Ansor) proposes TensorIR schedules for TE subgraphs by applying a set of **SearchRule** to the naive TensorIR schedule."

##########
File path: rfcs/0001-meta-schedule-autotensorir.md
##########
@@ -0,0 +1,444 @@
+<!--- Licensed to the Apache Software Foundation (ASF) under one -->
+<!--- or more contributor license agreements.  See the NOTICE file -->
+<!--- distributed with this work for additional information -->
+<!--- regarding copyright ownership.  The ASF licenses this file -->
+<!--- to you under the Apache License, Version 2.0 (the -->
+<!--- "License"); you may not use this file except in compliance -->
+<!--- with the License.  You may obtain a copy of the License at -->
+
+<!---   http://www.apache.org/licenses/LICENSE-2.0 -->
+
+<!--- Unless required by applicable law or agreed to in writing, -->
+<!--- software distributed under the License is distributed on an -->
+<!--- "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY -->
+<!--- KIND, either express or implied.  See the License for the -->
+<!--- specific language governing permissions and limitations -->
+<!--- under the License. -->
+
+* Feature Name: Meta Schedule (Formerly AutoTIR)
+* Start Date: 2021-05-28
+* RFC PR: TBD (apache/tvm-rfcs#0000)
+* GitHub Issue: TBD (apache/tvm-rfcs#0000)
+
+## 1. Summary
+
+This proposal introduces Meta Schedule: a probabilistic scheduling DSL on TIR that unifies the
+approaches of AutoTVM and Auto Scheduler (Ansor). Meta schedule provides a pragmatic way to define
+the space of automatic tuning, extensibility in terms of all possible TIR schedule primitives like
+tensorization and loop partitioning, and customizability on every layer of the automation system.
+
+Meta Schedule is our 3rd generation automatic scheduling system.
+
+## 2. Motivation
+
+**Scheduling and design space.** In TVM TensorIR, optimization of a TensorIR program is done via a
+sequence of transformations. For example, we reorder loops for better locality and we tensorize for
+specific hardware intrinsics. The process of invoking such a set of pre-defined transformations is
+called "**scheduling**", and each transformation is called a "**schedule primitive**". These
+primitives form a domain-specific language (DSL) describing the transformation of TensorIR programs.
+**Design space** is the set of all possible schedulings with respect to a TensorIR program.
+
+**Problems with the current scheduling system.** Currently we have 3 sets of scheduling APIs:
+* **Manual schedule**: Developers optimize their programs by manually invoking schedule primitives,
+  i.e. explore points in the design space with humans in the loop. This can be a tedious and
+  error-prone approach, hence the creation of AutoTVM and AutoScheduler (Ansor).
+* **AutoTVM**: The automation system requires users to define "schedule templates" as the design
+  space for each operator. Therefore, it is inextensible to hundreds of operators and variety
+  hardware platforms.
+* **AutoScheduler (Ansor)**: It automatically generates schedule templates as the design space,
+  according to a set of predefined "search rules". However, it is non-trivial to extend
+  AutoScheduler to new schedule primitives (tensorize, loop partition, software pipelining, etc).
+* The three systems above have isolated sets of APIs with several layers of their own abstraction,
+  which are not only hard to learn, but also engineering-intensive to customize.
+
+**Benefits of Meta Schedule.**  Meta schedule provides:
+* Succinct syntax, consistent APIs to TensorIR schedule with no other layer of abstraction.
+* Unified APIs for implementing manual schedules, AutoTVM-style schedules, and AutoScheduler-style
+  schedules.
+* Extensibility of all the schedule primitives, including tensorization and loop partitioning.
+  Almost no extra effort is needed to use a new primitive in auto-tuning.
+* The automation infrastructure is extensible on every of its components. Every component of the
+  system can be customized easily in pure python or C++ or both. For example, one can develop a new
+  design space generator in python, a new ProgramRunner in python, etc.
+
+
+## 3. Guide-level explanation
+
+In this section, we describe the syntax of meta schedule DSL, and how it could be used to describe
+and auto-generate the design space.
+
+### 3.1. Manual Schedule
+
+Meta schedule APIs are almost the same as TE or TensorIR scheduling. Here is an example of a manual
+schedule for matrix multiplication:
+
+```python
+# Designate a set of tile sizes
+i_tiles = [16, 8, 8, 8]
+j_tiles = [16, 8, 8, 8]
+k_tiles = [256, 8]
+
+# Tile the loops according to the tile sizes
+i_0, i_1, i_2, i_3 = sch.split(loop=i, factors=i_tiles)
+j_0, j_1, j_2, j_3 = sch.split(loop=j, factors=j_tiles)
+k_0, k_1           = sch.split(loop=k, factors=k_tiles)
+
+# Organize the loops into "SSRSRS" 6-level tiles
+sch.reorder(
+    i_0, j_0, # S: the 1st spatial tile
+    i_1, j_1, # S: the 2nd spatial tile
+    k_0,      # R: the 1st reduction tile
+    i_2, j_2, # S: the 3rd spatial tile
+    k_1,      # R: the 2nd reduction tile
+    i_3, j_3, # S: the 4th spatial tile
+)
+```
+
+In this example, the developers may tweak the tile sizes and measure the performance of the
+generated kernels to explore the opportunities of potential optimization.
+
+Generally speaking, while writing a schedule, there are often some parameters that are hard to
+determine ahead of time, for example, tile sizes, unroll steps, or which tensor intrinsics to use.
+Developers may manually enumerate possible combinations of these unknown factors, and then pick the
+best schedule according to measurement results on their device.
+
+### 3.2. Composite Schedule Rules
+
+As introduced in the previous section, in TensorIR, each schedule primitive handles only a very
+basic transformation of the IR. For example, `split` only splits a loop into two new loops. In the
+real world, the over-fine granularity of those primitives usually leads to repetitive and verbose
+scheduling code, as
+[mentioned](https://discuss.tvm.apache.org/t/rfc-tensorir-a-schedulable-ir-for-tvm/7872/43?u=junrushao1994)
+by developers in our community.
+
+To make it more convenient and modular, we allow users to register "composite schedules" that apply
+a sequence of schedule primitives according to certain analysis of the IR. For instance, a composite
+schedule may inspect a TensorIR block and decide whether we should call `compute_inline` on it.
+
+### 3.3. AutoTVM-style Design Space Description
+
+Meta schedule extends the schedule DSL with sampling instructions. When included in a schedule,
+these instructions parametrize the schedule from a single deterministic point to a space supported
+by random variables (tile size, etc.), making it possible for developers to describe the design
+space with meta schedule APIs.
+
+We can extend the matmul example above to cover all possible tilings using these sampling
+instructions:
+
+```python
+# Sample tile sizes
+i_tiles = sch.sample_perfect_tile(i, n=4)
+j_tiles = sch.sample_perfect_tile(j, n=4)
+k_tiles = sch.sample_perfect_tile(k, n=2)
+# Tile the loops according to the random variables
+i_0, i_1, i_2, i_3 = sch.split(loop=i, factors=i_tiles)
+j_0, j_1, j_2, j_3 = sch.split(loop=j, factors=j_tiles)
+k_0, k_1           = sch.split(loop=k, factors=k_tiles)
+# Organize the loops into "SSRSRS" 6-level tiles
+sch.reorder(
+    i_0, j_0, # S: the 1st spatial tile
+    i_1, j_1, # S: the 2nd spatial tile
+    k_0,      # R: the 1st reduction tile
+    i_2, j_2, # S: the 3rd spatial tile
+    k_1,      # R: the 2nd reduction tile
+    i_3, j_3, # S: the 4th spatial tile
+)
+```
+
+### 3.4. AutoScheduler-style Design Space Generation
+
+AutoScheduler (Ansor) generates schedule templates by applying their SearchRules to each stage.
+SearchRule analyzes TE and eagerly trigger schedule primitives accordingly in its internally
+maintained mini IR.
+
+As introduced in Section 3.2, composite schedule rules are equivalent to AutoScheduler's SearchRule
+in TensorIR scheduling. To further generate a design space for scheduling, sampling instructions are
+used in composite schedule rules. Similarly, the sketch generation phase in AutoScheduler is
+equivalent to applying composite schedule rules to each block in TensorIR.
+
+Several built-in composite schedule rules are shipped with our system to align with Ansor's design
+space:
+
+* Multi-level tiling
+* Inline pure spatial blocks
+* Parallelize & vectorize & unroll
+* Auto tensorize
+
+Developers may implement their own rules in either Python or C++. They may specify which rules to
+use with the following syntax:
+
+```python
+from tvm import meta_schedule as ms
+
+design_space_generator = ms.PostOrderApply(rules=[
+    ms.MultiLevelTiling(...),
+    CustomRule(...),
+    ms.OtherBuiltinRules(...),
+])
+
+```
+
+### 3.5. Unifying manual schedule / AutoTVM / Ansor
+
+In this section, we show that the design space induced by TE manual schedule, AutoTVM and Ansor are
+all subsets of meta schedule, and meta schedule further allows mixing those three styles to search
+jointly.
+
+**Manual schedule**. The TE schedule is a special case of a meta schedule program, where there is no
+randomness introduced by sampling instructions. It is a single point in terms of design space.
+
+**AutoTVM (Template-based tuning)**. It is more natural representation of AutoTVM’s schedule
+templates (knobs) by writing one or more schedule functions in meta schedule with sampling
+instructions. The execution traces generated by the schedule functions are the design space to be
+explored.
+
+**AutoScheduler (Ansor, Template-free tuning)**. As mentioned in the previous section, application
+of composite schedule rules generates the design space, which is equivalent to Ansor’s sketch
+generation.
+
+**Mixing styles in design space definition**. By taking union of the spaces induced by the three
+special cases, our system allows developers to combine generic rules that Ansor provides and
+operator-specific scheduling.
+
+## 4. Reference-level explanation
+
+In this section, we introduce the underlying techniques for the automation system to extract and
+explore the design space. The figure below briefly illustrates the workflow of the system:
+
+![meta-schedule-workflow](../resources/meta-schedule-workflow.png)
+
+### 4.1. Execution trace as the design space
+
+**Trace**. To represent the design space defined by the meta schedule DSL, the underlying system
+records all the instructions users applied to the schedule class, including sampling and schedule
+primitives. We call this list of instructions a trace.
+
+For instance, executing the example above results in the following trace:
+
+```
+Instruction 0. Sample-Perfect-Tile
+Instruction 1. Sample-Perfect-Tile
+Instruction 2. Sample-Perfect-Tile
+Instruction 3. Split
+Instruction 4. Split
+Instruction 5. Split
+Instruction 6. Reorder
+```
+
+**Trace forms design space.** A trace may contain zero or more sampling instructions, which
+introduces the uncertainty in scheduling - one instance of sampling results in one point in the
+design space. Therefore, the trace itself forms a design space to explore, e.g. which set of tile
+sizes works best on a specific hardware.
+
+**Union of design space**. Our system works on a set of traces, representing the union of the design
+spaces represented by every single trace.
+
+**Fork a trace**. When two different decisions in the scheduling process are equally important to
+generate high-performance schedules, we allow forking the trace into two, and the design space is
+the union of the forked traces.
+
+The trace is not strictly user-facing, but can be accessed and printed with the following syntax:
+
+```python
+# requires to trace the execution
+sch = tir.Schedule(..., traced=True)
+# do a lot of scheduling
+...
+# print the trace
+print(sch.trace)
+```
+
+And below is an example of the printed trace, which honestly reflects the schedule as a snippet of
+python scheduling function:
+
+```python
+b0 = sch.get_block(name="matmul", func_name="main")
+l1, l2, l3 = sch.get_loops(block=b0)
+v4, v5, v6, v7 = sch.sample_perfect_tile(loop=l1, n=4, max_innermost_factor=16, decision=[32, 1, 16, 2])
+v8, v9, v10, v11 = sch.sample_perfect_tile(loop=l2, n=4, max_innermost_factor=16, decision=[64, 4, 2, 2])
+v12, v13 = sch.sample_perfect_tile(loop=l3, n=2, max_innermost_factor=16, decision=[64, 16])
+l14, l15, l16, l17 = sch.split(loop=l1, factors=[v4, v5, v6, v7])
+l18, l19, l20, l21 = sch.split(loop=l2, factors=[v8, v9, v10, v11])
+l22, l23 = sch.split(loop=l3, factors=[v12, v13])
+sch.reorder(l14, l18, l15, l19, l22, l16, l20, l23, l17, l21)
+```
+
+### 4.2. Exploring the Design Space
+
+Meta Schedule provides several built-in exploration strategies to exhaustively or efficiently search
+for efficient schedules.
+
+**Random search by replaying schedule functions.** With a user-provided schedule function
+as a black-box design space generator, our system could repetitively invoke such an opaque function
+without doing any extra analysis. The function could be written in C++ or Python, or any language
+that implements packed function FFI. If sampling instructions are present in the function, then each
+invocation results in a different IRModule after being scheduled because the random decisions are
+possibly changed across different runs. Effectively, it is equivalent to
+random exploration without trace, allowing the flexibility for users to define arbitrary functions
+that trace may not well support (e.g. control flow divergence based on the value of intermediate
+random variables), but it forbids future opportunity of any trace-based analysis.
+
+**Random search by replaying traces.** Traces are obtained from a design space generator, and
+replayed with a builtin interpreter in our system. If sampling instructions are present on the
+traces, then their random decisions are mutated during each replay, i.e. jumps to a new point in the
+design space. Therefore, repetitive replay of those traces are equivalent to exploration of the
+design space. Our system could potentially benefit from trace-based analysis, including rejecting

Review comment:
       could you clarify "our system" here? i'm not quite sure where you're going with this--are these improvements to the system performance, or a necessary piece to make it work? (e.g. rejecting obviously-invalid schedules seems like a core piece rather than a way to improve).

##########
File path: rfcs/0001-meta-schedule-autotensorir.md
##########
@@ -0,0 +1,444 @@
+<!--- Licensed to the Apache Software Foundation (ASF) under one -->
+<!--- or more contributor license agreements.  See the NOTICE file -->
+<!--- distributed with this work for additional information -->
+<!--- regarding copyright ownership.  The ASF licenses this file -->
+<!--- to you under the Apache License, Version 2.0 (the -->
+<!--- "License"); you may not use this file except in compliance -->
+<!--- with the License.  You may obtain a copy of the License at -->
+
+<!---   http://www.apache.org/licenses/LICENSE-2.0 -->
+
+<!--- Unless required by applicable law or agreed to in writing, -->
+<!--- software distributed under the License is distributed on an -->
+<!--- "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY -->
+<!--- KIND, either express or implied.  See the License for the -->
+<!--- specific language governing permissions and limitations -->
+<!--- under the License. -->
+
+* Feature Name: Meta Schedule (Formerly AutoTIR)
+* Start Date: 2021-05-28
+* RFC PR: TBD (apache/tvm-rfcs#0000)
+* GitHub Issue: TBD (apache/tvm-rfcs#0000)
+
+## 1. Summary
+
+This proposal introduces Meta Schedule: a probabilistic scheduling DSL on TIR that unifies the
+approaches of AutoTVM and Auto Scheduler (Ansor). Meta schedule provides a pragmatic way to define
+the space of automatic tuning, extensibility in terms of all possible TIR schedule primitives like
+tensorization and loop partitioning, and customizability on every layer of the automation system.
+
+Meta Schedule is our 3rd generation automatic scheduling system.
+
+## 2. Motivation
+
+**Scheduling and design space.** In TVM TensorIR, optimization of a TensorIR program is done via a
+sequence of transformations. For example, we reorder loops for better locality and we tensorize for
+specific hardware intrinsics. The process of invoking such a set of pre-defined transformations is
+called "**scheduling**", and each transformation is called a "**schedule primitive**". These
+primitives form a domain-specific language (DSL) describing the transformation of TensorIR programs.
+**Design space** is the set of all possible schedulings with respect to a TensorIR program.
+
+**Problems with the current scheduling system.** Currently we have 3 sets of scheduling APIs:
+* **Manual schedule**: Developers optimize their programs by manually invoking schedule primitives,
+  i.e. explore points in the design space with humans in the loop. This can be a tedious and
+  error-prone approach, hence the creation of AutoTVM and AutoScheduler (Ansor).
+* **AutoTVM**: The automation system requires users to define "schedule templates" as the design
+  space for each operator. Therefore, it is inextensible to hundreds of operators and variety
+  hardware platforms.
+* **AutoScheduler (Ansor)**: It automatically generates schedule templates as the design space,
+  according to a set of predefined "search rules". However, it is non-trivial to extend
+  AutoScheduler to new schedule primitives (tensorize, loop partition, software pipelining, etc).
+* The three systems above have isolated sets of APIs with several layers of their own abstraction,
+  which are not only hard to learn, but also engineering-intensive to customize.
+
+**Benefits of Meta Schedule.**  Meta schedule provides:
+* Succinct syntax, consistent APIs to TensorIR schedule with no other layer of abstraction.
+* Unified APIs for implementing manual schedules, AutoTVM-style schedules, and AutoScheduler-style
+  schedules.
+* Extensibility of all the schedule primitives, including tensorization and loop partitioning.
+  Almost no extra effort is needed to use a new primitive in auto-tuning.
+* The automation infrastructure is extensible on every of its components. Every component of the
+  system can be customized easily in pure python or C++ or both. For example, one can develop a new
+  design space generator in python, a new ProgramRunner in python, etc.
+
+
+## 3. Guide-level explanation
+
+In this section, we describe the syntax of meta schedule DSL, and how it could be used to describe
+and auto-generate the design space.
+
+### 3.1. Manual Schedule
+
+Meta schedule APIs are almost the same as TE or TensorIR scheduling. Here is an example of a manual
+schedule for matrix multiplication:
+
+```python
+# Designate a set of tile sizes
+i_tiles = [16, 8, 8, 8]
+j_tiles = [16, 8, 8, 8]
+k_tiles = [256, 8]
+
+# Tile the loops according to the tile sizes
+i_0, i_1, i_2, i_3 = sch.split(loop=i, factors=i_tiles)
+j_0, j_1, j_2, j_3 = sch.split(loop=j, factors=j_tiles)
+k_0, k_1           = sch.split(loop=k, factors=k_tiles)
+
+# Organize the loops into "SSRSRS" 6-level tiles
+sch.reorder(
+    i_0, j_0, # S: the 1st spatial tile
+    i_1, j_1, # S: the 2nd spatial tile
+    k_0,      # R: the 1st reduction tile
+    i_2, j_2, # S: the 3rd spatial tile
+    k_1,      # R: the 2nd reduction tile
+    i_3, j_3, # S: the 4th spatial tile
+)
+```
+
+In this example, the developers may tweak the tile sizes and measure the performance of the
+generated kernels to explore the opportunities of potential optimization.
+
+Generally speaking, while writing a schedule, there are often some parameters that are hard to
+determine ahead of time, for example, tile sizes, unroll steps, or which tensor intrinsics to use.
+Developers may manually enumerate possible combinations of these unknown factors, and then pick the
+best schedule according to measurement results on their device.
+
+### 3.2. Composite Schedule Rules
+
+As introduced in the previous section, in TensorIR, each schedule primitive handles only a very
+basic transformation of the IR. For example, `split` only splits a loop into two new loops. In the
+real world, the over-fine granularity of those primitives usually leads to repetitive and verbose
+scheduling code, as
+[mentioned](https://discuss.tvm.apache.org/t/rfc-tensorir-a-schedulable-ir-for-tvm/7872/43?u=junrushao1994)
+by developers in our community.
+
+To make it more convenient and modular, we allow users to register "composite schedules" that apply
+a sequence of schedule primitives according to certain analysis of the IR. For instance, a composite
+schedule may inspect a TensorIR block and decide whether we should call `compute_inline` on it.
+
+### 3.3. AutoTVM-style Design Space Description
+
+Meta schedule extends the schedule DSL with sampling instructions. When included in a schedule,
+these instructions parametrize the schedule from a single deterministic point to a space supported
+by random variables (tile size, etc.), making it possible for developers to describe the design
+space with meta schedule APIs.
+
+We can extend the matmul example above to cover all possible tilings using these sampling
+instructions:
+
+```python
+# Sample tile sizes
+i_tiles = sch.sample_perfect_tile(i, n=4)
+j_tiles = sch.sample_perfect_tile(j, n=4)
+k_tiles = sch.sample_perfect_tile(k, n=2)
+# Tile the loops according to the random variables
+i_0, i_1, i_2, i_3 = sch.split(loop=i, factors=i_tiles)
+j_0, j_1, j_2, j_3 = sch.split(loop=j, factors=j_tiles)
+k_0, k_1           = sch.split(loop=k, factors=k_tiles)
+# Organize the loops into "SSRSRS" 6-level tiles
+sch.reorder(
+    i_0, j_0, # S: the 1st spatial tile
+    i_1, j_1, # S: the 2nd spatial tile
+    k_0,      # R: the 1st reduction tile
+    i_2, j_2, # S: the 3rd spatial tile
+    k_1,      # R: the 2nd reduction tile
+    i_3, j_3, # S: the 4th spatial tile
+)
+```
+
+### 3.4. AutoScheduler-style Design Space Generation
+
+AutoScheduler (Ansor) generates schedule templates by applying their SearchRules to each stage.
+SearchRule analyzes TE and eagerly trigger schedule primitives accordingly in its internally
+maintained mini IR.
+
+As introduced in Section 3.2, composite schedule rules are equivalent to AutoScheduler's SearchRule
+in TensorIR scheduling. To further generate a design space for scheduling, sampling instructions are
+used in composite schedule rules. Similarly, the sketch generation phase in AutoScheduler is
+equivalent to applying composite schedule rules to each block in TensorIR.
+
+Several built-in composite schedule rules are shipped with our system to align with Ansor's design
+space:
+
+* Multi-level tiling
+* Inline pure spatial blocks
+* Parallelize & vectorize & unroll
+* Auto tensorize
+
+Developers may implement their own rules in either Python or C++. They may specify which rules to
+use with the following syntax:
+
+```python
+from tvm import meta_schedule as ms
+
+design_space_generator = ms.PostOrderApply(rules=[
+    ms.MultiLevelTiling(...),
+    CustomRule(...),
+    ms.OtherBuiltinRules(...),
+])
+
+```
+
+### 3.5. Unifying manual schedule / AutoTVM / Ansor
+
+In this section, we show that the design space induced by TE manual schedule, AutoTVM and Ansor are
+all subsets of meta schedule, and meta schedule further allows mixing those three styles to search
+jointly.
+
+**Manual schedule**. The TE schedule is a special case of a meta schedule program, where there is no
+randomness introduced by sampling instructions. It is a single point in terms of design space.
+
+**AutoTVM (Template-based tuning)**. It is more natural representation of AutoTVM’s schedule
+templates (knobs) by writing one or more schedule functions in meta schedule with sampling
+instructions. The execution traces generated by the schedule functions are the design space to be
+explored.
+
+**AutoScheduler (Ansor, Template-free tuning)**. As mentioned in the previous section, application
+of composite schedule rules generates the design space, which is equivalent to Ansor’s sketch
+generation.
+
+**Mixing styles in design space definition**. By taking union of the spaces induced by the three
+special cases, our system allows developers to combine generic rules that Ansor provides and
+operator-specific scheduling.
+
+## 4. Reference-level explanation
+
+In this section, we introduce the underlying techniques for the automation system to extract and
+explore the design space. The figure below briefly illustrates the workflow of the system:
+
+![meta-schedule-workflow](../resources/meta-schedule-workflow.png)
+
+### 4.1. Execution trace as the design space
+
+**Trace**. To represent the design space defined by the meta schedule DSL, the underlying system
+records all the instructions users applied to the schedule class, including sampling and schedule
+primitives. We call this list of instructions a trace.
+
+For instance, executing the example above results in the following trace:
+
+```
+Instruction 0. Sample-Perfect-Tile
+Instruction 1. Sample-Perfect-Tile
+Instruction 2. Sample-Perfect-Tile
+Instruction 3. Split
+Instruction 4. Split
+Instruction 5. Split
+Instruction 6. Reorder
+```
+
+**Trace forms design space.** A trace may contain zero or more sampling instructions, which
+introduces the uncertainty in scheduling - one instance of sampling results in one point in the
+design space. Therefore, the trace itself forms a design space to explore, e.g. which set of tile
+sizes works best on a specific hardware.
+
+**Union of design space**. Our system works on a set of traces, representing the union of the design
+spaces represented by every single trace.
+
+**Fork a trace**. When two different decisions in the scheduling process are equally important to
+generate high-performance schedules, we allow forking the trace into two, and the design space is
+the union of the forked traces.
+
+The trace is not strictly user-facing, but can be accessed and printed with the following syntax:
+
+```python
+# requires to trace the execution
+sch = tir.Schedule(..., traced=True)
+# do a lot of scheduling
+...
+# print the trace
+print(sch.trace)
+```
+
+And below is an example of the printed trace, which honestly reflects the schedule as a snippet of
+python scheduling function:
+
+```python
+b0 = sch.get_block(name="matmul", func_name="main")
+l1, l2, l3 = sch.get_loops(block=b0)
+v4, v5, v6, v7 = sch.sample_perfect_tile(loop=l1, n=4, max_innermost_factor=16, decision=[32, 1, 16, 2])
+v8, v9, v10, v11 = sch.sample_perfect_tile(loop=l2, n=4, max_innermost_factor=16, decision=[64, 4, 2, 2])
+v12, v13 = sch.sample_perfect_tile(loop=l3, n=2, max_innermost_factor=16, decision=[64, 16])
+l14, l15, l16, l17 = sch.split(loop=l1, factors=[v4, v5, v6, v7])
+l18, l19, l20, l21 = sch.split(loop=l2, factors=[v8, v9, v10, v11])
+l22, l23 = sch.split(loop=l3, factors=[v12, v13])
+sch.reorder(l14, l18, l15, l19, l22, l16, l20, l23, l17, l21)
+```
+
+### 4.2. Exploring the Design Space
+
+Meta Schedule provides several built-in exploration strategies to exhaustively or efficiently search
+for efficient schedules.
+
+**Random search by replaying schedule functions.** With a user-provided schedule function
+as a black-box design space generator, our system could repetitively invoke such an opaque function
+without doing any extra analysis. The function could be written in C++ or Python, or any language
+that implements packed function FFI. If sampling instructions are present in the function, then each
+invocation results in a different IRModule after being scheduled because the random decisions are
+possibly changed across different runs. Effectively, it is equivalent to
+random exploration without trace, allowing the flexibility for users to define arbitrary functions
+that trace may not well support (e.g. control flow divergence based on the value of intermediate
+random variables), but it forbids future opportunity of any trace-based analysis.
+
+**Random search by replaying traces.** Traces are obtained from a design space generator, and
+replayed with a builtin interpreter in our system. If sampling instructions are present on the
+traces, then their random decisions are mutated during each replay, i.e. jumps to a new point in the
+design space. Therefore, repetitive replay of those traces are equivalent to exploration of the
+design space. Our system could potentially benefit from trace-based analysis, including rejecting
+obviously invalid schedules (e.g. using too much CUDA resources), doing dead-code elimination to
+simplify a trace, extracting trace-based features used in the cost model, etc.
+
+**Cost-model-guided evolutionary search**. A more efficient exploration strategy. We define two sets
+of rules:
+
+* Mutator: defines how to jump to a point’s "neighbor" in the design space
+* Postprocessor: after the trace is executed, there are some extra rules we want to execute, for
+  example:
+  * Check CUDA resource limits: There is a hard requirement in CUDA that the maximum number of
+    threads should not exceed 1024, but it is a random variable that cannot be determined before
+    actually executing the trace. In this case, we write a postprocessor that errors out when the
+    condition is not satisfied.
+  * Fuse outer loops until the extent of the fused loops is large enough: The number of outer loops
+    to be fused together depends on their extents, which are random variables. In this case, we
+    annotate the maximum extent allowed on the block, and do actual fusion in a postprocessor.
+
+Our evolutionary search algorithm uses mutators to find possible schedules in the design space, then
+applies postprocessors, and asks the cost model to predict its performance. After several
+iterations, the new schedules with the highest scores are finally compiled and measured on device.
+Epsilon-greedy is used in this process to balance exploitation and exploration.
+
+### 4.3. Python first for flexibility & customizability
+
+We implement the system in a way that all levels are decoupled and open to customization, aiming at
+providing a playground for developers to try out new ideas and potentially deliver performance
+quickly.
+
+While all the important APIs are implemented in C++ for efficiency, every part of the system can be
+easily switched to customized python implementation. For example,
+
+**Customize design space in python**. Can be a python function that does the schedule

Review comment:
       make these headings

##########
File path: rfcs/0001-meta-schedule-autotensorir.md
##########
@@ -0,0 +1,444 @@
+<!--- Licensed to the Apache Software Foundation (ASF) under one -->
+<!--- or more contributor license agreements.  See the NOTICE file -->
+<!--- distributed with this work for additional information -->
+<!--- regarding copyright ownership.  The ASF licenses this file -->
+<!--- to you under the Apache License, Version 2.0 (the -->
+<!--- "License"); you may not use this file except in compliance -->
+<!--- with the License.  You may obtain a copy of the License at -->
+
+<!---   http://www.apache.org/licenses/LICENSE-2.0 -->
+
+<!--- Unless required by applicable law or agreed to in writing, -->
+<!--- software distributed under the License is distributed on an -->
+<!--- "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY -->
+<!--- KIND, either express or implied.  See the License for the -->
+<!--- specific language governing permissions and limitations -->
+<!--- under the License. -->
+
+* Feature Name: Meta Schedule (Formerly AutoTIR)
+* Start Date: 2021-05-28
+* RFC PR: TBD (apache/tvm-rfcs#0000)
+* GitHub Issue: TBD (apache/tvm-rfcs#0000)
+
+## 1. Summary
+
+This proposal introduces Meta Schedule: a probabilistic scheduling DSL on TIR that unifies the
+approaches of AutoTVM and Auto Scheduler (Ansor). Meta schedule provides a pragmatic way to define
+the space of automatic tuning, extensibility in terms of all possible TIR schedule primitives like
+tensorization and loop partitioning, and customizability on every layer of the automation system.
+
+Meta Schedule is our 3rd generation automatic scheduling system.
+
+## 2. Motivation
+
+**Scheduling and design space.** In TVM TensorIR, optimization of a TensorIR program is done via a
+sequence of transformations. For example, we reorder loops for better locality and we tensorize for
+specific hardware intrinsics. The process of invoking such a set of pre-defined transformations is
+called "**scheduling**", and each transformation is called a "**schedule primitive**". These
+primitives form a domain-specific language (DSL) describing the transformation of TensorIR programs.
+**Design space** is the set of all possible schedulings with respect to a TensorIR program.
+
+**Problems with the current scheduling system.** Currently we have 3 sets of scheduling APIs:
+* **Manual schedule**: Developers optimize their programs by manually invoking schedule primitives,
+  i.e. explore points in the design space with humans in the loop. This can be a tedious and
+  error-prone approach, hence the creation of AutoTVM and AutoScheduler (Ansor).
+* **AutoTVM**: The automation system requires users to define "schedule templates" as the design
+  space for each operator. Therefore, it is inextensible to hundreds of operators and variety
+  hardware platforms.
+* **AutoScheduler (Ansor)**: It automatically generates schedule templates as the design space,
+  according to a set of predefined "search rules". However, it is non-trivial to extend
+  AutoScheduler to new schedule primitives (tensorize, loop partition, software pipelining, etc).
+* The three systems above have isolated sets of APIs with several layers of their own abstraction,
+  which are not only hard to learn, but also engineering-intensive to customize.
+
+**Benefits of Meta Schedule.**  Meta schedule provides:
+* Succinct syntax, consistent APIs to TensorIR schedule with no other layer of abstraction.
+* Unified APIs for implementing manual schedules, AutoTVM-style schedules, and AutoScheduler-style
+  schedules.
+* Extensibility of all the schedule primitives, including tensorization and loop partitioning.
+  Almost no extra effort is needed to use a new primitive in auto-tuning.
+* The automation infrastructure is extensible on every of its components. Every component of the
+  system can be customized easily in pure python or C++ or both. For example, one can develop a new
+  design space generator in python, a new ProgramRunner in python, etc.
+
+
+## 3. Guide-level explanation
+
+In this section, we describe the syntax of meta schedule DSL, and how it could be used to describe
+and auto-generate the design space.
+
+### 3.1. Manual Schedule
+
+Meta schedule APIs are almost the same as TE or TensorIR scheduling. Here is an example of a manual
+schedule for matrix multiplication:
+
+```python
+# Designate a set of tile sizes
+i_tiles = [16, 8, 8, 8]
+j_tiles = [16, 8, 8, 8]
+k_tiles = [256, 8]
+
+# Tile the loops according to the tile sizes
+i_0, i_1, i_2, i_3 = sch.split(loop=i, factors=i_tiles)
+j_0, j_1, j_2, j_3 = sch.split(loop=j, factors=j_tiles)
+k_0, k_1           = sch.split(loop=k, factors=k_tiles)
+
+# Organize the loops into "SSRSRS" 6-level tiles
+sch.reorder(
+    i_0, j_0, # S: the 1st spatial tile
+    i_1, j_1, # S: the 2nd spatial tile
+    k_0,      # R: the 1st reduction tile
+    i_2, j_2, # S: the 3rd spatial tile
+    k_1,      # R: the 2nd reduction tile
+    i_3, j_3, # S: the 4th spatial tile
+)
+```
+
+In this example, the developers may tweak the tile sizes and measure the performance of the
+generated kernels to explore the opportunities of potential optimization.
+
+Generally speaking, while writing a schedule, there are often some parameters that are hard to
+determine ahead of time, for example, tile sizes, unroll steps, or which tensor intrinsics to use.
+Developers may manually enumerate possible combinations of these unknown factors, and then pick the
+best schedule according to measurement results on their device.
+
+### 3.2. Composite Schedule Rules
+
+As introduced in the previous section, in TensorIR, each schedule primitive handles only a very
+basic transformation of the IR. For example, `split` only splits a loop into two new loops. In the
+real world, the over-fine granularity of those primitives usually leads to repetitive and verbose
+scheduling code, as
+[mentioned](https://discuss.tvm.apache.org/t/rfc-tensorir-a-schedulable-ir-for-tvm/7872/43?u=junrushao1994)
+by developers in our community.
+
+To make it more convenient and modular, we allow users to register "composite schedules" that apply
+a sequence of schedule primitives according to certain analysis of the IR. For instance, a composite
+schedule may inspect a TensorIR block and decide whether we should call `compute_inline` on it.
+
+### 3.3. AutoTVM-style Design Space Description
+
+Meta schedule extends the schedule DSL with sampling instructions. When included in a schedule,
+these instructions parametrize the schedule from a single deterministic point to a space supported
+by random variables (tile size, etc.), making it possible for developers to describe the design
+space with meta schedule APIs.
+
+We can extend the matmul example above to cover all possible tilings using these sampling
+instructions:
+
+```python
+# Sample tile sizes
+i_tiles = sch.sample_perfect_tile(i, n=4)
+j_tiles = sch.sample_perfect_tile(j, n=4)
+k_tiles = sch.sample_perfect_tile(k, n=2)
+# Tile the loops according to the random variables
+i_0, i_1, i_2, i_3 = sch.split(loop=i, factors=i_tiles)
+j_0, j_1, j_2, j_3 = sch.split(loop=j, factors=j_tiles)
+k_0, k_1           = sch.split(loop=k, factors=k_tiles)
+# Organize the loops into "SSRSRS" 6-level tiles
+sch.reorder(
+    i_0, j_0, # S: the 1st spatial tile
+    i_1, j_1, # S: the 2nd spatial tile
+    k_0,      # R: the 1st reduction tile
+    i_2, j_2, # S: the 3rd spatial tile
+    k_1,      # R: the 2nd reduction tile
+    i_3, j_3, # S: the 4th spatial tile
+)
+```
+
+### 3.4. AutoScheduler-style Design Space Generation
+
+AutoScheduler (Ansor) generates schedule templates by applying their SearchRules to each stage.
+SearchRule analyzes TE and eagerly trigger schedule primitives accordingly in its internally
+maintained mini IR.
+
+As introduced in Section 3.2, composite schedule rules are equivalent to AutoScheduler's SearchRule
+in TensorIR scheduling. To further generate a design space for scheduling, sampling instructions are
+used in composite schedule rules. Similarly, the sketch generation phase in AutoScheduler is
+equivalent to applying composite schedule rules to each block in TensorIR.
+
+Several built-in composite schedule rules are shipped with our system to align with Ansor's design
+space:
+
+* Multi-level tiling
+* Inline pure spatial blocks
+* Parallelize & vectorize & unroll
+* Auto tensorize
+
+Developers may implement their own rules in either Python or C++. They may specify which rules to
+use with the following syntax:
+
+```python
+from tvm import meta_schedule as ms
+
+design_space_generator = ms.PostOrderApply(rules=[
+    ms.MultiLevelTiling(...),
+    CustomRule(...),
+    ms.OtherBuiltinRules(...),
+])
+
+```
+
+### 3.5. Unifying manual schedule / AutoTVM / Ansor
+
+In this section, we show that the design space induced by TE manual schedule, AutoTVM and Ansor are
+all subsets of meta schedule, and meta schedule further allows mixing those three styles to search
+jointly.
+
+**Manual schedule**. The TE schedule is a special case of a meta schedule program, where there is no
+randomness introduced by sampling instructions. It is a single point in terms of design space.
+
+**AutoTVM (Template-based tuning)**. It is more natural representation of AutoTVM’s schedule
+templates (knobs) by writing one or more schedule functions in meta schedule with sampling
+instructions. The execution traces generated by the schedule functions are the design space to be
+explored.
+
+**AutoScheduler (Ansor, Template-free tuning)**. As mentioned in the previous section, application
+of composite schedule rules generates the design space, which is equivalent to Ansor’s sketch
+generation.
+
+**Mixing styles in design space definition**. By taking union of the spaces induced by the three
+special cases, our system allows developers to combine generic rules that Ansor provides and
+operator-specific scheduling.
+
+## 4. Reference-level explanation
+
+In this section, we introduce the underlying techniques for the automation system to extract and
+explore the design space. The figure below briefly illustrates the workflow of the system:
+
+![meta-schedule-workflow](../resources/meta-schedule-workflow.png)
+
+### 4.1. Execution trace as the design space
+
+**Trace**. To represent the design space defined by the meta schedule DSL, the underlying system
+records all the instructions users applied to the schedule class, including sampling and schedule
+primitives. We call this list of instructions a trace.
+
+For instance, executing the example above results in the following trace:
+
+```
+Instruction 0. Sample-Perfect-Tile
+Instruction 1. Sample-Perfect-Tile
+Instruction 2. Sample-Perfect-Tile
+Instruction 3. Split
+Instruction 4. Split
+Instruction 5. Split
+Instruction 6. Reorder
+```
+
+**Trace forms design space.** A trace may contain zero or more sampling instructions, which
+introduces the uncertainty in scheduling - one instance of sampling results in one point in the
+design space. Therefore, the trace itself forms a design space to explore, e.g. which set of tile
+sizes works best on a specific hardware.
+
+**Union of design space**. Our system works on a set of traces, representing the union of the design
+spaces represented by every single trace.
+
+**Fork a trace**. When two different decisions in the scheduling process are equally important to
+generate high-performance schedules, we allow forking the trace into two, and the design space is
+the union of the forked traces.
+
+The trace is not strictly user-facing, but can be accessed and printed with the following syntax:
+
+```python
+# requires to trace the execution
+sch = tir.Schedule(..., traced=True)
+# do a lot of scheduling
+...
+# print the trace
+print(sch.trace)
+```
+
+And below is an example of the printed trace, which honestly reflects the schedule as a snippet of
+python scheduling function:
+
+```python
+b0 = sch.get_block(name="matmul", func_name="main")
+l1, l2, l3 = sch.get_loops(block=b0)
+v4, v5, v6, v7 = sch.sample_perfect_tile(loop=l1, n=4, max_innermost_factor=16, decision=[32, 1, 16, 2])
+v8, v9, v10, v11 = sch.sample_perfect_tile(loop=l2, n=4, max_innermost_factor=16, decision=[64, 4, 2, 2])
+v12, v13 = sch.sample_perfect_tile(loop=l3, n=2, max_innermost_factor=16, decision=[64, 16])
+l14, l15, l16, l17 = sch.split(loop=l1, factors=[v4, v5, v6, v7])
+l18, l19, l20, l21 = sch.split(loop=l2, factors=[v8, v9, v10, v11])
+l22, l23 = sch.split(loop=l3, factors=[v12, v13])
+sch.reorder(l14, l18, l15, l19, l22, l16, l20, l23, l17, l21)
+```
+
+### 4.2. Exploring the Design Space
+
+Meta Schedule provides several built-in exploration strategies to exhaustively or efficiently search
+for efficient schedules.
+
+**Random search by replaying schedule functions.** With a user-provided schedule function
+as a black-box design space generator, our system could repetitively invoke such an opaque function
+without doing any extra analysis. The function could be written in C++ or Python, or any language
+that implements packed function FFI. If sampling instructions are present in the function, then each
+invocation results in a different IRModule after being scheduled because the random decisions are
+possibly changed across different runs. Effectively, it is equivalent to
+random exploration without trace, allowing the flexibility for users to define arbitrary functions
+that trace may not well support (e.g. control flow divergence based on the value of intermediate
+random variables), but it forbids future opportunity of any trace-based analysis.
+
+**Random search by replaying traces.** Traces are obtained from a design space generator, and
+replayed with a builtin interpreter in our system. If sampling instructions are present on the
+traces, then their random decisions are mutated during each replay, i.e. jumps to a new point in the
+design space. Therefore, repetitive replay of those traces are equivalent to exploration of the
+design space. Our system could potentially benefit from trace-based analysis, including rejecting
+obviously invalid schedules (e.g. using too much CUDA resources), doing dead-code elimination to
+simplify a trace, extracting trace-based features used in the cost model, etc.
+
+**Cost-model-guided evolutionary search**. A more efficient exploration strategy. We define two sets
+of rules:
+
+* Mutator: defines how to jump to a point’s "neighbor" in the design space
+* Postprocessor: after the trace is executed, there are some extra rules we want to execute, for
+  example:
+  * Check CUDA resource limits: There is a hard requirement in CUDA that the maximum number of
+    threads should not exceed 1024, but it is a random variable that cannot be determined before
+    actually executing the trace. In this case, we write a postprocessor that errors out when the
+    condition is not satisfied.
+  * Fuse outer loops until the extent of the fused loops is large enough: The number of outer loops
+    to be fused together depends on their extents, which are random variables. In this case, we
+    annotate the maximum extent allowed on the block, and do actual fusion in a postprocessor.
+
+Our evolutionary search algorithm uses mutators to find possible schedules in the design space, then
+applies postprocessors, and asks the cost model to predict its performance. After several
+iterations, the new schedules with the highest scores are finally compiled and measured on device.
+Epsilon-greedy is used in this process to balance exploitation and exploration.
+
+### 4.3. Python first for flexibility & customizability

Review comment:
       nit: Python-first

##########
File path: rfcs/0001-meta-schedule-autotensorir.md
##########
@@ -0,0 +1,444 @@
+<!--- Licensed to the Apache Software Foundation (ASF) under one -->
+<!--- or more contributor license agreements.  See the NOTICE file -->
+<!--- distributed with this work for additional information -->
+<!--- regarding copyright ownership.  The ASF licenses this file -->
+<!--- to you under the Apache License, Version 2.0 (the -->
+<!--- "License"); you may not use this file except in compliance -->
+<!--- with the License.  You may obtain a copy of the License at -->
+
+<!---   http://www.apache.org/licenses/LICENSE-2.0 -->
+
+<!--- Unless required by applicable law or agreed to in writing, -->
+<!--- software distributed under the License is distributed on an -->
+<!--- "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY -->
+<!--- KIND, either express or implied.  See the License for the -->
+<!--- specific language governing permissions and limitations -->
+<!--- under the License. -->
+
+* Feature Name: Meta Schedule (Formerly AutoTIR)
+* Start Date: 2021-05-28
+* RFC PR: TBD (apache/tvm-rfcs#0000)
+* GitHub Issue: TBD (apache/tvm-rfcs#0000)
+
+## 1. Summary
+
+This proposal introduces Meta Schedule: a probabilistic scheduling DSL on TIR that unifies the
+approaches of AutoTVM and Auto Scheduler (Ansor). Meta schedule provides a pragmatic way to define
+the space of automatic tuning, extensibility in terms of all possible TIR schedule primitives like
+tensorization and loop partitioning, and customizability on every layer of the automation system.
+
+Meta Schedule is our 3rd generation automatic scheduling system.
+
+## 2. Motivation
+
+**Scheduling and design space.** In TVM TensorIR, optimization of a TensorIR program is done via a
+sequence of transformations. For example, we reorder loops for better locality and we tensorize for
+specific hardware intrinsics. The process of invoking such a set of pre-defined transformations is
+called "**scheduling**", and each transformation is called a "**schedule primitive**". These
+primitives form a domain-specific language (DSL) describing the transformation of TensorIR programs.
+**Design space** is the set of all possible schedulings with respect to a TensorIR program.
+
+**Problems with the current scheduling system.** Currently we have 3 sets of scheduling APIs:
+* **Manual schedule**: Developers optimize their programs by manually invoking schedule primitives,
+  i.e. explore points in the design space with humans in the loop. This can be a tedious and
+  error-prone approach, hence the creation of AutoTVM and AutoScheduler (Ansor).
+* **AutoTVM**: The automation system requires users to define "schedule templates" as the design
+  space for each operator. Therefore, it is inextensible to hundreds of operators and variety
+  hardware platforms.
+* **AutoScheduler (Ansor)**: It automatically generates schedule templates as the design space,
+  according to a set of predefined "search rules". However, it is non-trivial to extend
+  AutoScheduler to new schedule primitives (tensorize, loop partition, software pipelining, etc).
+* The three systems above have isolated sets of APIs with several layers of their own abstraction,
+  which are not only hard to learn, but also engineering-intensive to customize.
+
+**Benefits of Meta Schedule.**  Meta schedule provides:
+* Succinct syntax, consistent APIs to TensorIR schedule with no other layer of abstraction.
+* Unified APIs for implementing manual schedules, AutoTVM-style schedules, and AutoScheduler-style
+  schedules.
+* Extensibility of all the schedule primitives, including tensorization and loop partitioning.
+  Almost no extra effort is needed to use a new primitive in auto-tuning.
+* The automation infrastructure is extensible on every of its components. Every component of the
+  system can be customized easily in pure python or C++ or both. For example, one can develop a new
+  design space generator in python, a new ProgramRunner in python, etc.
+
+
+## 3. Guide-level explanation
+
+In this section, we describe the syntax of meta schedule DSL, and how it could be used to describe
+and auto-generate the design space.
+
+### 3.1. Manual Schedule
+
+Meta schedule APIs are almost the same as TE or TensorIR scheduling. Here is an example of a manual
+schedule for matrix multiplication:
+
+```python
+# Designate a set of tile sizes
+i_tiles = [16, 8, 8, 8]
+j_tiles = [16, 8, 8, 8]
+k_tiles = [256, 8]
+
+# Tile the loops according to the tile sizes
+i_0, i_1, i_2, i_3 = sch.split(loop=i, factors=i_tiles)
+j_0, j_1, j_2, j_3 = sch.split(loop=j, factors=j_tiles)
+k_0, k_1           = sch.split(loop=k, factors=k_tiles)
+
+# Organize the loops into "SSRSRS" 6-level tiles
+sch.reorder(
+    i_0, j_0, # S: the 1st spatial tile
+    i_1, j_1, # S: the 2nd spatial tile
+    k_0,      # R: the 1st reduction tile
+    i_2, j_2, # S: the 3rd spatial tile
+    k_1,      # R: the 2nd reduction tile
+    i_3, j_3, # S: the 4th spatial tile
+)
+```
+
+In this example, the developers may tweak the tile sizes and measure the performance of the
+generated kernels to explore the opportunities of potential optimization.
+
+Generally speaking, while writing a schedule, there are often some parameters that are hard to
+determine ahead of time, for example, tile sizes, unroll steps, or which tensor intrinsics to use.
+Developers may manually enumerate possible combinations of these unknown factors, and then pick the
+best schedule according to measurement results on their device.
+
+### 3.2. Composite Schedule Rules
+
+As introduced in the previous section, in TensorIR, each schedule primitive handles only a very
+basic transformation of the IR. For example, `split` only splits a loop into two new loops. In the
+real world, the over-fine granularity of those primitives usually leads to repetitive and verbose
+scheduling code, as
+[mentioned](https://discuss.tvm.apache.org/t/rfc-tensorir-a-schedulable-ir-for-tvm/7872/43?u=junrushao1994)
+by developers in our community.
+
+To make it more convenient and modular, we allow users to register "composite schedules" that apply
+a sequence of schedule primitives according to certain analysis of the IR. For instance, a composite
+schedule may inspect a TensorIR block and decide whether we should call `compute_inline` on it.
+
+### 3.3. AutoTVM-style Design Space Description
+
+Meta schedule extends the schedule DSL with sampling instructions. When included in a schedule,
+these instructions parametrize the schedule from a single deterministic point to a space supported
+by random variables (tile size, etc.), making it possible for developers to describe the design
+space with meta schedule APIs.
+
+We can extend the matmul example above to cover all possible tilings using these sampling
+instructions:
+
+```python
+# Sample tile sizes
+i_tiles = sch.sample_perfect_tile(i, n=4)
+j_tiles = sch.sample_perfect_tile(j, n=4)
+k_tiles = sch.sample_perfect_tile(k, n=2)
+# Tile the loops according to the random variables
+i_0, i_1, i_2, i_3 = sch.split(loop=i, factors=i_tiles)
+j_0, j_1, j_2, j_3 = sch.split(loop=j, factors=j_tiles)
+k_0, k_1           = sch.split(loop=k, factors=k_tiles)
+# Organize the loops into "SSRSRS" 6-level tiles
+sch.reorder(
+    i_0, j_0, # S: the 1st spatial tile
+    i_1, j_1, # S: the 2nd spatial tile
+    k_0,      # R: the 1st reduction tile
+    i_2, j_2, # S: the 3rd spatial tile
+    k_1,      # R: the 2nd reduction tile
+    i_3, j_3, # S: the 4th spatial tile
+)
+```
+
+### 3.4. AutoScheduler-style Design Space Generation
+
+AutoScheduler (Ansor) generates schedule templates by applying their SearchRules to each stage.
+SearchRule analyzes TE and eagerly trigger schedule primitives accordingly in its internally
+maintained mini IR.
+
+As introduced in Section 3.2, composite schedule rules are equivalent to AutoScheduler's SearchRule
+in TensorIR scheduling. To further generate a design space for scheduling, sampling instructions are
+used in composite schedule rules. Similarly, the sketch generation phase in AutoScheduler is
+equivalent to applying composite schedule rules to each block in TensorIR.
+
+Several built-in composite schedule rules are shipped with our system to align with Ansor's design
+space:
+
+* Multi-level tiling
+* Inline pure spatial blocks
+* Parallelize & vectorize & unroll
+* Auto tensorize
+
+Developers may implement their own rules in either Python or C++. They may specify which rules to
+use with the following syntax:
+
+```python
+from tvm import meta_schedule as ms
+
+design_space_generator = ms.PostOrderApply(rules=[
+    ms.MultiLevelTiling(...),
+    CustomRule(...),
+    ms.OtherBuiltinRules(...),
+])
+
+```
+
+### 3.5. Unifying manual schedule / AutoTVM / Ansor
+
+In this section, we show that the design space induced by TE manual schedule, AutoTVM and Ansor are
+all subsets of meta schedule, and meta schedule further allows mixing those three styles to search
+jointly.
+
+**Manual schedule**. The TE schedule is a special case of a meta schedule program, where there is no
+randomness introduced by sampling instructions. It is a single point in terms of design space.
+
+**AutoTVM (Template-based tuning)**. It is more natural representation of AutoTVM’s schedule
+templates (knobs) by writing one or more schedule functions in meta schedule with sampling
+instructions. The execution traces generated by the schedule functions are the design space to be
+explored.
+
+**AutoScheduler (Ansor, Template-free tuning)**. As mentioned in the previous section, application
+of composite schedule rules generates the design space, which is equivalent to Ansor’s sketch
+generation.
+
+**Mixing styles in design space definition**. By taking union of the spaces induced by the three
+special cases, our system allows developers to combine generic rules that Ansor provides and
+operator-specific scheduling.
+
+## 4. Reference-level explanation
+
+In this section, we introduce the underlying techniques for the automation system to extract and
+explore the design space. The figure below briefly illustrates the workflow of the system:
+
+![meta-schedule-workflow](../resources/meta-schedule-workflow.png)
+
+### 4.1. Execution trace as the design space
+
+**Trace**. To represent the design space defined by the meta schedule DSL, the underlying system
+records all the instructions users applied to the schedule class, including sampling and schedule
+primitives. We call this list of instructions a trace.
+
+For instance, executing the example above results in the following trace:
+
+```
+Instruction 0. Sample-Perfect-Tile
+Instruction 1. Sample-Perfect-Tile
+Instruction 2. Sample-Perfect-Tile
+Instruction 3. Split
+Instruction 4. Split
+Instruction 5. Split
+Instruction 6. Reorder
+```
+
+**Trace forms design space.** A trace may contain zero or more sampling instructions, which
+introduces the uncertainty in scheduling - one instance of sampling results in one point in the
+design space. Therefore, the trace itself forms a design space to explore, e.g. which set of tile
+sizes works best on a specific hardware.
+
+**Union of design space**. Our system works on a set of traces, representing the union of the design
+spaces represented by every single trace.
+
+**Fork a trace**. When two different decisions in the scheduling process are equally important to
+generate high-performance schedules, we allow forking the trace into two, and the design space is
+the union of the forked traces.
+
+The trace is not strictly user-facing, but can be accessed and printed with the following syntax:
+
+```python
+# requires to trace the execution
+sch = tir.Schedule(..., traced=True)
+# do a lot of scheduling
+...
+# print the trace
+print(sch.trace)
+```
+
+And below is an example of the printed trace, which honestly reflects the schedule as a snippet of
+python scheduling function:
+
+```python
+b0 = sch.get_block(name="matmul", func_name="main")
+l1, l2, l3 = sch.get_loops(block=b0)
+v4, v5, v6, v7 = sch.sample_perfect_tile(loop=l1, n=4, max_innermost_factor=16, decision=[32, 1, 16, 2])
+v8, v9, v10, v11 = sch.sample_perfect_tile(loop=l2, n=4, max_innermost_factor=16, decision=[64, 4, 2, 2])
+v12, v13 = sch.sample_perfect_tile(loop=l3, n=2, max_innermost_factor=16, decision=[64, 16])
+l14, l15, l16, l17 = sch.split(loop=l1, factors=[v4, v5, v6, v7])
+l18, l19, l20, l21 = sch.split(loop=l2, factors=[v8, v9, v10, v11])
+l22, l23 = sch.split(loop=l3, factors=[v12, v13])
+sch.reorder(l14, l18, l15, l19, l22, l16, l20, l23, l17, l21)
+```
+
+### 4.2. Exploring the Design Space
+
+Meta Schedule provides several built-in exploration strategies to exhaustively or efficiently search
+for efficient schedules.
+
+**Random search by replaying schedule functions.** With a user-provided schedule function
+as a black-box design space generator, our system could repetitively invoke such an opaque function
+without doing any extra analysis. The function could be written in C++ or Python, or any language
+that implements packed function FFI. If sampling instructions are present in the function, then each
+invocation results in a different IRModule after being scheduled because the random decisions are
+possibly changed across different runs. Effectively, it is equivalent to
+random exploration without trace, allowing the flexibility for users to define arbitrary functions
+that trace may not well support (e.g. control flow divergence based on the value of intermediate
+random variables), but it forbids future opportunity of any trace-based analysis.
+
+**Random search by replaying traces.** Traces are obtained from a design space generator, and
+replayed with a builtin interpreter in our system. If sampling instructions are present on the
+traces, then their random decisions are mutated during each replay, i.e. jumps to a new point in the
+design space. Therefore, repetitive replay of those traces are equivalent to exploration of the
+design space. Our system could potentially benefit from trace-based analysis, including rejecting
+obviously invalid schedules (e.g. using too much CUDA resources), doing dead-code elimination to
+simplify a trace, extracting trace-based features used in the cost model, etc.
+
+**Cost-model-guided evolutionary search**. A more efficient exploration strategy. We define two sets
+of rules:
+
+* Mutator: defines how to jump to a point’s "neighbor" in the design space
+* Postprocessor: after the trace is executed, there are some extra rules we want to execute, for
+  example:
+  * Check CUDA resource limits: There is a hard requirement in CUDA that the maximum number of
+    threads should not exceed 1024, but it is a random variable that cannot be determined before
+    actually executing the trace. In this case, we write a postprocessor that errors out when the
+    condition is not satisfied.
+  * Fuse outer loops until the extent of the fused loops is large enough: The number of outer loops
+    to be fused together depends on their extents, which are random variables. In this case, we
+    annotate the maximum extent allowed on the block, and do actual fusion in a postprocessor.
+
+Our evolutionary search algorithm uses mutators to find possible schedules in the design space, then

Review comment:
       i think this process might be best explained with a numbered list. also, can you provide more context including what the process starts and ends with?
   
   The evolutionary search process is as follows:
   1. Begin with a naive TIR schedule
   2. Apply Mutator to find possible schedule transformations
   3. Pick one (how?)
   4. Apply post-processor to the choice to validate/tweak it
   5. Measure or predict performance with cost model (explain when you would do one or the other)
   6. Repeat steps 2-6 until <exit condition>

##########
File path: rfcs/0001-meta-schedule-autotensorir.md
##########
@@ -0,0 +1,444 @@
+<!--- Licensed to the Apache Software Foundation (ASF) under one -->
+<!--- or more contributor license agreements.  See the NOTICE file -->
+<!--- distributed with this work for additional information -->
+<!--- regarding copyright ownership.  The ASF licenses this file -->
+<!--- to you under the Apache License, Version 2.0 (the -->
+<!--- "License"); you may not use this file except in compliance -->
+<!--- with the License.  You may obtain a copy of the License at -->
+
+<!---   http://www.apache.org/licenses/LICENSE-2.0 -->
+
+<!--- Unless required by applicable law or agreed to in writing, -->
+<!--- software distributed under the License is distributed on an -->
+<!--- "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY -->
+<!--- KIND, either express or implied.  See the License for the -->
+<!--- specific language governing permissions and limitations -->
+<!--- under the License. -->
+
+* Feature Name: Meta Schedule (Formerly AutoTIR)
+* Start Date: 2021-05-28
+* RFC PR: TBD (apache/tvm-rfcs#0000)
+* GitHub Issue: TBD (apache/tvm-rfcs#0000)
+
+## 1. Summary
+
+This proposal introduces Meta Schedule: a probabilistic scheduling DSL on TIR that unifies the
+approaches of AutoTVM and Auto Scheduler (Ansor). Meta schedule provides a pragmatic way to define
+the space of automatic tuning, extensibility in terms of all possible TIR schedule primitives like
+tensorization and loop partitioning, and customizability on every layer of the automation system.
+
+Meta Schedule is our 3rd generation automatic scheduling system.
+
+## 2. Motivation
+
+**Scheduling and design space.** In TVM TensorIR, optimization of a TensorIR program is done via a
+sequence of transformations. For example, we reorder loops for better locality and we tensorize for
+specific hardware intrinsics. The process of invoking such a set of pre-defined transformations is
+called "**scheduling**", and each transformation is called a "**schedule primitive**". These
+primitives form a domain-specific language (DSL) describing the transformation of TensorIR programs.
+**Design space** is the set of all possible schedulings with respect to a TensorIR program.
+
+**Problems with the current scheduling system.** Currently we have 3 sets of scheduling APIs:
+* **Manual schedule**: Developers optimize their programs by manually invoking schedule primitives,
+  i.e. explore points in the design space with humans in the loop. This can be a tedious and
+  error-prone approach, hence the creation of AutoTVM and AutoScheduler (Ansor).
+* **AutoTVM**: The automation system requires users to define "schedule templates" as the design
+  space for each operator. Therefore, it is inextensible to hundreds of operators and variety
+  hardware platforms.
+* **AutoScheduler (Ansor)**: It automatically generates schedule templates as the design space,
+  according to a set of predefined "search rules". However, it is non-trivial to extend
+  AutoScheduler to new schedule primitives (tensorize, loop partition, software pipelining, etc).
+* The three systems above have isolated sets of APIs with several layers of their own abstraction,
+  which are not only hard to learn, but also engineering-intensive to customize.
+
+**Benefits of Meta Schedule.**  Meta schedule provides:
+* Succinct syntax, consistent APIs to TensorIR schedule with no other layer of abstraction.
+* Unified APIs for implementing manual schedules, AutoTVM-style schedules, and AutoScheduler-style
+  schedules.
+* Extensibility of all the schedule primitives, including tensorization and loop partitioning.
+  Almost no extra effort is needed to use a new primitive in auto-tuning.
+* The automation infrastructure is extensible on every of its components. Every component of the
+  system can be customized easily in pure python or C++ or both. For example, one can develop a new
+  design space generator in python, a new ProgramRunner in python, etc.
+
+
+## 3. Guide-level explanation
+
+In this section, we describe the syntax of meta schedule DSL, and how it could be used to describe
+and auto-generate the design space.
+
+### 3.1. Manual Schedule
+
+Meta schedule APIs are almost the same as TE or TensorIR scheduling. Here is an example of a manual
+schedule for matrix multiplication:
+
+```python
+# Designate a set of tile sizes
+i_tiles = [16, 8, 8, 8]
+j_tiles = [16, 8, 8, 8]
+k_tiles = [256, 8]
+
+# Tile the loops according to the tile sizes
+i_0, i_1, i_2, i_3 = sch.split(loop=i, factors=i_tiles)
+j_0, j_1, j_2, j_3 = sch.split(loop=j, factors=j_tiles)
+k_0, k_1           = sch.split(loop=k, factors=k_tiles)
+
+# Organize the loops into "SSRSRS" 6-level tiles
+sch.reorder(
+    i_0, j_0, # S: the 1st spatial tile
+    i_1, j_1, # S: the 2nd spatial tile
+    k_0,      # R: the 1st reduction tile
+    i_2, j_2, # S: the 3rd spatial tile
+    k_1,      # R: the 2nd reduction tile
+    i_3, j_3, # S: the 4th spatial tile
+)
+```
+
+In this example, the developers may tweak the tile sizes and measure the performance of the
+generated kernels to explore the opportunities of potential optimization.
+
+Generally speaking, while writing a schedule, there are often some parameters that are hard to
+determine ahead of time, for example, tile sizes, unroll steps, or which tensor intrinsics to use.
+Developers may manually enumerate possible combinations of these unknown factors, and then pick the
+best schedule according to measurement results on their device.
+
+### 3.2. Composite Schedule Rules
+
+As introduced in the previous section, in TensorIR, each schedule primitive handles only a very
+basic transformation of the IR. For example, `split` only splits a loop into two new loops. In the
+real world, the over-fine granularity of those primitives usually leads to repetitive and verbose
+scheduling code, as
+[mentioned](https://discuss.tvm.apache.org/t/rfc-tensorir-a-schedulable-ir-for-tvm/7872/43?u=junrushao1994)
+by developers in our community.
+
+To make it more convenient and modular, we allow users to register "composite schedules" that apply
+a sequence of schedule primitives according to certain analysis of the IR. For instance, a composite
+schedule may inspect a TensorIR block and decide whether we should call `compute_inline` on it.
+
+### 3.3. AutoTVM-style Design Space Description
+
+Meta schedule extends the schedule DSL with sampling instructions. When included in a schedule,
+these instructions parametrize the schedule from a single deterministic point to a space supported
+by random variables (tile size, etc.), making it possible for developers to describe the design
+space with meta schedule APIs.
+
+We can extend the matmul example above to cover all possible tilings using these sampling
+instructions:
+
+```python
+# Sample tile sizes
+i_tiles = sch.sample_perfect_tile(i, n=4)
+j_tiles = sch.sample_perfect_tile(j, n=4)
+k_tiles = sch.sample_perfect_tile(k, n=2)
+# Tile the loops according to the random variables
+i_0, i_1, i_2, i_3 = sch.split(loop=i, factors=i_tiles)
+j_0, j_1, j_2, j_3 = sch.split(loop=j, factors=j_tiles)
+k_0, k_1           = sch.split(loop=k, factors=k_tiles)
+# Organize the loops into "SSRSRS" 6-level tiles
+sch.reorder(
+    i_0, j_0, # S: the 1st spatial tile
+    i_1, j_1, # S: the 2nd spatial tile
+    k_0,      # R: the 1st reduction tile
+    i_2, j_2, # S: the 3rd spatial tile
+    k_1,      # R: the 2nd reduction tile
+    i_3, j_3, # S: the 4th spatial tile
+)
+```
+
+### 3.4. AutoScheduler-style Design Space Generation
+
+AutoScheduler (Ansor) generates schedule templates by applying their SearchRules to each stage.
+SearchRule analyzes TE and eagerly trigger schedule primitives accordingly in its internally
+maintained mini IR.
+
+As introduced in Section 3.2, composite schedule rules are equivalent to AutoScheduler's SearchRule
+in TensorIR scheduling. To further generate a design space for scheduling, sampling instructions are
+used in composite schedule rules. Similarly, the sketch generation phase in AutoScheduler is
+equivalent to applying composite schedule rules to each block in TensorIR.
+
+Several built-in composite schedule rules are shipped with our system to align with Ansor's design
+space:
+
+* Multi-level tiling
+* Inline pure spatial blocks
+* Parallelize & vectorize & unroll
+* Auto tensorize
+
+Developers may implement their own rules in either Python or C++. They may specify which rules to
+use with the following syntax:
+
+```python
+from tvm import meta_schedule as ms
+
+design_space_generator = ms.PostOrderApply(rules=[
+    ms.MultiLevelTiling(...),
+    CustomRule(...),
+    ms.OtherBuiltinRules(...),
+])
+
+```
+
+### 3.5. Unifying manual schedule / AutoTVM / Ansor
+
+In this section, we show that the design space induced by TE manual schedule, AutoTVM and Ansor are
+all subsets of meta schedule, and meta schedule further allows mixing those three styles to search
+jointly.
+
+**Manual schedule**. The TE schedule is a special case of a meta schedule program, where there is no
+randomness introduced by sampling instructions. It is a single point in terms of design space.
+
+**AutoTVM (Template-based tuning)**. It is more natural representation of AutoTVM’s schedule
+templates (knobs) by writing one or more schedule functions in meta schedule with sampling
+instructions. The execution traces generated by the schedule functions are the design space to be
+explored.
+
+**AutoScheduler (Ansor, Template-free tuning)**. As mentioned in the previous section, application
+of composite schedule rules generates the design space, which is equivalent to Ansor’s sketch
+generation.
+
+**Mixing styles in design space definition**. By taking union of the spaces induced by the three
+special cases, our system allows developers to combine generic rules that Ansor provides and
+operator-specific scheduling.
+
+## 4. Reference-level explanation
+
+In this section, we introduce the underlying techniques for the automation system to extract and
+explore the design space. The figure below briefly illustrates the workflow of the system:
+
+![meta-schedule-workflow](../resources/meta-schedule-workflow.png)
+
+### 4.1. Execution trace as the design space
+
+**Trace**. To represent the design space defined by the meta schedule DSL, the underlying system
+records all the instructions users applied to the schedule class, including sampling and schedule
+primitives. We call this list of instructions a trace.
+
+For instance, executing the example above results in the following trace:
+
+```
+Instruction 0. Sample-Perfect-Tile
+Instruction 1. Sample-Perfect-Tile
+Instruction 2. Sample-Perfect-Tile
+Instruction 3. Split
+Instruction 4. Split
+Instruction 5. Split
+Instruction 6. Reorder
+```
+
+**Trace forms design space.** A trace may contain zero or more sampling instructions, which
+introduces the uncertainty in scheduling - one instance of sampling results in one point in the
+design space. Therefore, the trace itself forms a design space to explore, e.g. which set of tile
+sizes works best on a specific hardware.
+
+**Union of design space**. Our system works on a set of traces, representing the union of the design
+spaces represented by every single trace.
+
+**Fork a trace**. When two different decisions in the scheduling process are equally important to
+generate high-performance schedules, we allow forking the trace into two, and the design space is
+the union of the forked traces.
+
+The trace is not strictly user-facing, but can be accessed and printed with the following syntax:
+
+```python
+# requires to trace the execution
+sch = tir.Schedule(..., traced=True)
+# do a lot of scheduling
+...
+# print the trace
+print(sch.trace)
+```
+
+And below is an example of the printed trace, which honestly reflects the schedule as a snippet of
+python scheduling function:
+
+```python
+b0 = sch.get_block(name="matmul", func_name="main")
+l1, l2, l3 = sch.get_loops(block=b0)
+v4, v5, v6, v7 = sch.sample_perfect_tile(loop=l1, n=4, max_innermost_factor=16, decision=[32, 1, 16, 2])
+v8, v9, v10, v11 = sch.sample_perfect_tile(loop=l2, n=4, max_innermost_factor=16, decision=[64, 4, 2, 2])
+v12, v13 = sch.sample_perfect_tile(loop=l3, n=2, max_innermost_factor=16, decision=[64, 16])
+l14, l15, l16, l17 = sch.split(loop=l1, factors=[v4, v5, v6, v7])
+l18, l19, l20, l21 = sch.split(loop=l2, factors=[v8, v9, v10, v11])
+l22, l23 = sch.split(loop=l3, factors=[v12, v13])
+sch.reorder(l14, l18, l15, l19, l22, l16, l20, l23, l17, l21)
+```
+
+### 4.2. Exploring the Design Space
+
+Meta Schedule provides several built-in exploration strategies to exhaustively or efficiently search
+for efficient schedules.
+
+**Random search by replaying schedule functions.** With a user-provided schedule function
+as a black-box design space generator, our system could repetitively invoke such an opaque function
+without doing any extra analysis. The function could be written in C++ or Python, or any language
+that implements packed function FFI. If sampling instructions are present in the function, then each
+invocation results in a different IRModule after being scheduled because the random decisions are
+possibly changed across different runs. Effectively, it is equivalent to
+random exploration without trace, allowing the flexibility for users to define arbitrary functions
+that trace may not well support (e.g. control flow divergence based on the value of intermediate
+random variables), but it forbids future opportunity of any trace-based analysis.
+
+**Random search by replaying traces.** Traces are obtained from a design space generator, and
+replayed with a builtin interpreter in our system. If sampling instructions are present on the
+traces, then their random decisions are mutated during each replay, i.e. jumps to a new point in the
+design space. Therefore, repetitive replay of those traces are equivalent to exploration of the
+design space. Our system could potentially benefit from trace-based analysis, including rejecting
+obviously invalid schedules (e.g. using too much CUDA resources), doing dead-code elimination to
+simplify a trace, extracting trace-based features used in the cost model, etc.
+
+**Cost-model-guided evolutionary search**. A more efficient exploration strategy. We define two sets
+of rules:
+
+* Mutator: defines how to jump to a point’s "neighbor" in the design space
+* Postprocessor: after the trace is executed, there are some extra rules we want to execute, for
+  example:
+  * Check CUDA resource limits: There is a hard requirement in CUDA that the maximum number of
+    threads should not exceed 1024, but it is a random variable that cannot be determined before
+    actually executing the trace. In this case, we write a postprocessor that errors out when the

Review comment:
       again consider dropping "we," here and below

##########
File path: rfcs/0001-meta-schedule-autotensorir.md
##########
@@ -0,0 +1,339 @@
+<!--- Licensed to the Apache Software Foundation (ASF) under one -->
+<!--- or more contributor license agreements.  See the NOTICE file -->
+<!--- distributed with this work for additional information -->
+<!--- regarding copyright ownership.  The ASF licenses this file -->
+<!--- to you under the Apache License, Version 2.0 (the -->
+<!--- "License"); you may not use this file except in compliance -->
+<!--- with the License.  You may obtain a copy of the License at -->
+
+<!---   http://www.apache.org/licenses/LICENSE-2.0 -->
+
+<!--- Unless required by applicable law or agreed to in writing, -->
+<!--- software distributed under the License is distributed on an -->
+<!--- "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY -->
+<!--- KIND, either express or implied.  See the License for the -->
+<!--- specific language governing permissions and limitations -->
+<!--- under the License. -->
+
+* Feature Name: Meta Schedule (AutoTIR)
+* Start Date: 2021-05-28
+* RFC PR: TBD (apache/tvm-rfcs#0000)
+* GitHub Issue: TBD (apache/tvm-rfcs#0000)
+
+## 1. Summary
+
+This proposal introduces Meta Schedule: a probabilistic scheduling DSL on TIR that unifies the approaches of AutoTVM and Auto Scheduler (Ansor). Meta schedule provides a pragmatic way to define the space of automatic tuning, extensibility in terms of all possible TIR schedule primitives like tensorization and loop partitioning, and customizability on every layer of the automation system.
+
+Meta Schedule is our 3rd generation automatic scheduling system.
+
+## 2. Motivation
+
+**Scheduling and Design Space**
+
+In TVM TensorIR, optimization of a TensorIR program is done via a sequence of transformations. For example, we reorder loops for better locality and we tensorize for specific hardware intrinsics. The process of invoking such a set of pre-defined transformations is called "**scheduling**", and each transformation is called a "**schedule primitive**". These primitives form a domain-specific language (DSL) describing the transformation of TensorIR programs. **Design space** is the set of all possible schedulings with respect to a TensorIR program.
+
+**Problems with the Current Scheduling System**
+
+* **Manual schedule**: Developers optimize their programs by manually invoking schedule primitives, i.e. explore points in the design space with humans in the loop. This can be a tedious and error-prone approach, hence the creation of AutoTVM and AutoScheduler (Ansor).
+* **AutoTVM**: The automation system requires users to define "schedule templates" as the design space for each operator. Therefore, it is inextensible to hundreds of operators and variety hardware platforms.
+* **AutoScheduler (Ansor)**: It automatically generates schedule templates as the design space, according to a set of predefined "search rules". However, it is non-trivial to extend AutoScheduler to new schedule primitives (tensorize, loop partition, software pipelining, etc).
+* The three systems above have isolated sets of APIs with several layers of their own abstraction, which are not only hard to learn, but also engineering-intensive to customize.
+
+**Benefit of Meta Schedule**
+
+* Succinct syntax, consistent APIs to TensorIR schedule with no other layer of abstraction.
+* Provides unified APIs for implementing manual schedule, AutoTVM and AutoScheduler (Ansor).
+* Extensibility to all the schedule primitives, including tensorization and loop partitioning. Almost no extra effort is needed to use a new primitive in auto-tuning.
+* The automation infrastructure is extensible on every of its components. Every component of the system can be customized easily in pure python or C++ or both. For example, one can develop a new design space generator in python, a new ProgramRunner in python, etc.
+
+
+## 3. Guide-level explanation
+
+In this section, we describe the syntax of meta schedule DSL, and how it could be used to describe and auto-generate the design space.
+
+### 3.1. Manual Schedule
+
+Meta schedule APIs are almost the same as TE or TensorIR scheduling. Here is an example of a manual schedule for matrix multiplication:
+
+```python
+# Designate a set of tile sizes
+i_tiles = [16, 8, 8, 8]
+j_tiles = [16, 8, 8, 8]
+k_tiles = [256, 8]
+
+# Tile the loops according to the tile sizes
+i_0, i_1, i_2, i_3 = sch.split(loop=i, factors=i_tiles)
+j_0, j_1, j_2, j_3 = sch.split(loop=j, factors=j_tiles)
+k_0, k_1           = sch.split(loop=k, factors=k_tiles)
+
+# Organize the loops into "SSRSRS" 6-level tiles
+sch.reorder(
+    i_0, j_0, # S: the 1st spatial tile
+    i_1, j_1, # S: the 2nd spatial tile
+    k_0,      # R: the 1st reduction tile
+    i_2, j_2, # S: the 3rd spatial tile
+    k_1,      # R: the 2nd reduction tile
+    i_3, j_3, # S: the 4th spatial tile
+)
+```
+
+In this example, the developers may tweak the tile sizes and measure the performance of the generated kernels to explore the opportunities of potential optimization.
+
+Generally speaking, while writing a schedule, there are often some parameters that are hard to determine ahead of time, for example, tile sizes, unroll steps, or which tensor intrinsics to use. Developers may manually enumerate possible combinations of these unknown factors, and then pick the best schedule according to measurement results on their device.
+
+### 3.2. Composite Schedule Rules
+
+As introduced in the previous section, in TensorIR, each schedule primitive handles only a very basic transformation of the IR, for example, `split` only splits a loop into two. In the real world, the over-fine granularity of those primitives usually leads to repetitive and verbose scheduling code, as [mentioned](https://discuss.tvm.apache.org/t/rfc-tensorir-a-schedulable-ir-for-tvm/7872/43?u=junrushao1994) by developers in our community.
+
+To make it more convenient and modular, we allow users to register "composite schedules" that apply a sequence of schedule primitives according to certain analysis of the IR. For instance, a composite schedule may inspect a TensorIR block and decide whether we should call `compute_inline` on it.
+
+### 3.3. AutoTVM-style Design Space Description
+
+Meta schedule extends the schedule DSL with sampling instructions. When included in a schedule, these instructions parametrize the schedule from a single deterministic point to a space supported by random variables (tile size, etc.), making it possible for developers to describe the design space with meta schedule APIs.
+
+We can extend the matmul example above to cover all possible tilings using these sampling instructions:
+
+```python
+# Sample tile sizes
+i_tiles = sch.sample_perfect_tile(i, n=4)
+j_tiles = sch.sample_perfect_tile(j, n=4)
+k_tiles = sch.sample_perfect_tile(k, n=2)
+# Tile the loops according to the random variables
+i_0, i_1, i_2, i_3 = sch.split(loop=i, factors=i_tiles)
+j_0, j_1, j_2, j_3 = sch.split(loop=j, factors=j_tiles)
+k_0, k_1           = sch.split(loop=k, factors=k_tiles)
+# Organize the loops into "SSRSRS" 6-level tiles
+sch.reorder(
+    i_0, j_0, # S: the 1st spatial tile
+    i_1, j_1, # S: the 2nd spatial tile
+    k_0,      # R: the 1st reduction tile
+    i_2, j_2, # S: the 3rd spatial tile
+    k_1,      # R: the 2nd reduction tile
+    i_3, j_3, # S: the 4th spatial tile
+)
+```
+
+### 3.4. AutoScheduler-style Design Space Generation
+
+AutoScheduler (Ansor) generates schedule templates by applying their SearchRules to each stage. SearchRule analyzes TE and eagerly trigger schedule primitives accordingly in its internally maintained mini IR.
+
+As introduced in Section 3.2, composite schedule rules are equivalent to AutoScheduler's SearchRule in TensorIR scheduling. To further generate a design space for scheduling, sampling instructions are used in composite schedule rules. Similarly, the sketch generation phase in AutoScheduler is equivalent to applying composite schedule rules to each block in TensorIR.
+
+Several built-in composite schedule rules are shipped with our system to align with Ansor's design space:
+
+* Multi-level tiling
+* Inline pure spatial blocks
+* Parallelize & vectorize & unroll
+* Auto tensorize
+* ...
+
+Developers may implement their own rules in either Python or C++, and specify which rules to use with the following syntax:
+
+```python
+from tvm import meta_schedule as ms
+
+design_space_generator = ms.PostOrderApply(rules=[
+    ms.MultiLevelTiling(...),
+    CustomRule(...),
+    ms.OtherBuiltinRules(...),
+])
+
+```
+
+### 3.5. Unifying manual schedule / AutoTVM / Ansor
+
+In this section, we show that the design space induced by TE manual schedule, AutoTVM and Ansor are all subsets of meta schedule, and meta schedule further allows mixing those three styles to search jointly.
+
+**Manual schedule**. The TE schedule is a special case of a meta schedule program, where there is no randomness introduced by sampling instructions. It is a single point in terms of design space.
+
+**AutoTVM (Template-based tuning)**. It is more natural representation of AutoTVM’s schedule templates (knobs) by writing one or more schedule functions in meta schedule with sampling instructions. The execution traces generated by the schedule functions are the design space to be explored.
+
+**AutoScheduler (Ansor, Template-free tuning)**. As mentioned in the previous section, application of composite schedule rules generates the design space, which is equivalent to Ansor’s sketch generation.
+
+**Mixing styles in design space definition**. By taking union of the spaces induced by the three special cases, our system allows developers to combine generic rules that Ansor provides and operator-specific scheduling.
+
+## 4. Reference-level explanation
+
+In this section, we introduce the underlying techniques for the automation system to extract and explore the design space. The figure below briefly illustrates the workflow of the system:
+
+![meta-schedule-workflow](../resources/meta-schedule-workflow.png)
+
+### 4.1. Execution trace as the design space
+
+**Trace**. To represent the design space defined by the meta schedule DSL, the underlying system records all the instructions users applied to the schedule class, including sampling and schedule primitives. We call this list of instructions a trace.
+
+Executing the example above results in the following trace:
+
+```
+Instruction 0. Sample-Perfect-Tile
+Instruction 1. Sample-Perfect-Tile
+Instruction 2. Sample-Perfect-Tile
+Instruction 3. Split
+Instruction 4. Split
+Instruction 5. Split
+Instruction 6. Reorder
+```
+
+**Trace forms design space.** A trace may contain zero or more sampling instructions, which introduces the uncertainty in scheduling - one instance of sampling results in one point in the design space. Therefore, the trace itself forms a design space to explore, e.g. which set of tile sizes works best on a specific hardware.
+
+**Union of design space**. Our system works on a set of traces, representing the union of the design spaces represented by every single trace.
+
+**Fork a trace**. When two different decisions in the scheduling process are equally important to generate high-performance schedules, we allow forking the trace into two, and the design space is the union of the forked traces.
+
+The trace is not strictly user-facing, but can be accessed and printed with the following syntax:
+
+```python
+# requires to trace the execution
+sch = tir.Schedule(..., traced=True)
+# do a lot of scheduling
+...
+# print the trace
+print(sch.trace)
+```
+
+And below is an example of the printed trace, which honestly reflects the schedule as a snippet of python scheduling function:
+
+```python
+b0 = sch.get_block(name="matmul", func_name="main")
+l1, l2, l3 = sch.get_loops(block=b0)
+v4, v5, v6, v7 = sch.sample_perfect_tile(loop=l1, n=4, max_innermost_factor=16, decision=[32, 1, 16, 2])
+v8, v9, v10, v11 = sch.sample_perfect_tile(loop=l2, n=4, max_innermost_factor=16, decision=[64, 4, 2, 2])
+v12, v13 = sch.sample_perfect_tile(loop=l3, n=2, max_innermost_factor=16, decision=[64, 16])
+l14, l15, l16, l17 = sch.split(loop=l1, factors=[v4, v5, v6, v7])
+l18, l19, l20, l21 = sch.split(loop=l2, factors=[v8, v9, v10, v11])
+l22, l23 = sch.split(loop=l3, factors=[v12, v13])
+sch.reorder(l14, l18, l15, l19, l22, l16, l20, l23, l17, l21)
+```
+
+### 4.2. Exploring the Search Space
+
+Meta Schedule provides several built-in exploration strategies to exhaustively or efficiently search for efficient schedules.
+
+**Program replay**. A simple strategy that replays the user-provided schedule function without using or taking any advantage of trace.
+
+**Random search**. Extracts the traces as the design space from any design space generator (e.g. user-provided schedule function, composite schedule rules applied to each block, or any custom space generator), repetitively mutates the random decisions of a random trace and re-executes the traces.
+
+**Cost-model-guided evolutionary search**. A more efficient exploration strategy. We define two sets of rules:
+
+* Mutator: defines how to jump to a point’s "neighbor" in the design space
+* Postprocessor: after the trace is executed, there are some extra rules we want to execute, for example:
+  * Check CUDA resource limits: There is a hard requirement in CUDA that the maximum number of threads should not exceed 1024, but it is a random variable that cannot be determined before actually executing the trace. In this case, we write a postprocessor that errors out when the condition is not satisfied.
+  * Fuse outer loops until the extent of the fused loops is large enough: The number of outer loops to be fused together depends on their extents, which are random variables. In this case, we annotate the maximum extent allowed on the block, and do actual fusion in a postprocessor.
+
+Our evolutionary search algorithm uses mutators to find possible schedules in the design space, then applies postprocessors, and asks the cost model to predict its performance. After several iterations, the new schedules with the highest scores are finally compiled and measured on device. Epsilon-greedy is used in this process to balance exploitation and exploration.
+
+### 4.3. Python first for flexibility & customizability
+
+We implement the system in a way that all levels are decoupled and open to customization, aiming at providing a playground for developers to try out new ideas and potentially deliver performance quickly.
+
+While all the important APIs are implemented in C++ for efficiency, every part of the system can be easily switched to customized python implementation. For example,
+
+**Customize design space in python**. Can be a python function that does the schedule
+
+```python
+def schedule_matmul(sch) -> sch:
+    i, j, k = sch.get_loops(sch.get_block("matmul"))
+    i_tiles = sch.sample_perfect_tile(i, n=4)
+    j_tiles = sch.sample_perfect_tile(j, n=4)
+    k_tiles = sch.sample_perfect_tile(k, n=2)
+    # Tile the loops according to the random variables
+    i_0, i_1, i_2, i_3 = sch.split(loop=i, factors=i_tiles)
+    j_0, j_1, j_2, j_3 = sch.split(loop=j, factors=j_tiles)
+    k_0, k_1 = sch.split(loop=k, factors=k_tiles)
+    # Organize the loops into "SSRSRS" 6-level tiles
+    sch.reorder(
+        i_0, j_0, # S
+        i_1, j_1, # S
+        k_0,      # R
+        i_2, j_2, # S
+        k_1,      # R
+        i_3, j_3, # S
+    )
+    return sch
+```
+
+**Customize composite schedule in python**. We provide two ways to define a composite schedule in python:
+
+Method 1. Derive from `PyCompositeSchedule`, and implement two methods `initialize` and `apply`:
+
+```python
+class MultiLevelTiling(PyCompositeSchedule):
+    def initialize(...):
+        # initialize the class, usually this method is empty
+        ...
+
+    def apply(sch: Schedule, block: BlockRV) -> Union[Schedule, List[Schedule]]:
+        # write any python code, including:
+        # - analyze `block`
+        # - invoke schedule primitives in `sch`
+        # - do debug printing
+        ...
+```
+
+Method 2. A decorator as the syntactic sugar if the `initialize` method is empty, which converts the function to the `apply` method.
+
+```python
+@tir.as_composite_schedule(name="multi-level-tiling")
+def multi_level_tiling(sch: Schedule, block: BlockRV) -> Union[Schedule, List[Schedule]]:
+    # write any python code, including:
+    # - analyze `block`
+    # - invoke schedule primitives in `sch`
+    # - do debug printing
+    ...
+```
+
+**Customize exploration strategies in python**. Developers can implement any search algorithm in python as well by deriving from `PySearchPolicy`, and the syntax is identical to customizing with `PyCompositeSchedule`.
+
+**Other customizable components**. This list includes:
+
+* Cost model
+* Database
+* Measure callbacks
+* Feature extractor
+* Program builder & runner
+* Analysis methods
+* ...
+
+In a short summary, almost every component of the system is decoupled with each other and extensions could be easily plugged in.
+
+## 5. Drawbacks
+
+We are not aware of any drawbacks of the proposed system.

Review comment:
       at least one drawback is that the complexity is increasing. please state some more.

##########
File path: rfcs/0001-meta-schedule-autotensorir.md
##########
@@ -0,0 +1,444 @@
+<!--- Licensed to the Apache Software Foundation (ASF) under one -->
+<!--- or more contributor license agreements.  See the NOTICE file -->
+<!--- distributed with this work for additional information -->
+<!--- regarding copyright ownership.  The ASF licenses this file -->
+<!--- to you under the Apache License, Version 2.0 (the -->
+<!--- "License"); you may not use this file except in compliance -->
+<!--- with the License.  You may obtain a copy of the License at -->
+
+<!---   http://www.apache.org/licenses/LICENSE-2.0 -->
+
+<!--- Unless required by applicable law or agreed to in writing, -->
+<!--- software distributed under the License is distributed on an -->
+<!--- "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY -->
+<!--- KIND, either express or implied.  See the License for the -->
+<!--- specific language governing permissions and limitations -->
+<!--- under the License. -->
+
+* Feature Name: Meta Schedule (Formerly AutoTIR)
+* Start Date: 2021-05-28
+* RFC PR: TBD (apache/tvm-rfcs#0000)
+* GitHub Issue: TBD (apache/tvm-rfcs#0000)
+
+## 1. Summary
+
+This proposal introduces Meta Schedule: a probabilistic scheduling DSL on TIR that unifies the
+approaches of AutoTVM and Auto Scheduler (Ansor). Meta schedule provides a pragmatic way to define
+the space of automatic tuning, extensibility in terms of all possible TIR schedule primitives like
+tensorization and loop partitioning, and customizability on every layer of the automation system.
+
+Meta Schedule is our 3rd generation automatic scheduling system.
+
+## 2. Motivation
+
+**Scheduling and design space.** In TVM TensorIR, optimization of a TensorIR program is done via a
+sequence of transformations. For example, we reorder loops for better locality and we tensorize for
+specific hardware intrinsics. The process of invoking such a set of pre-defined transformations is
+called "**scheduling**", and each transformation is called a "**schedule primitive**". These
+primitives form a domain-specific language (DSL) describing the transformation of TensorIR programs.
+**Design space** is the set of all possible schedulings with respect to a TensorIR program.
+
+**Problems with the current scheduling system.** Currently we have 3 sets of scheduling APIs:
+* **Manual schedule**: Developers optimize their programs by manually invoking schedule primitives,
+  i.e. explore points in the design space with humans in the loop. This can be a tedious and
+  error-prone approach, hence the creation of AutoTVM and AutoScheduler (Ansor).
+* **AutoTVM**: The automation system requires users to define "schedule templates" as the design
+  space for each operator. Therefore, it is inextensible to hundreds of operators and variety
+  hardware platforms.
+* **AutoScheduler (Ansor)**: It automatically generates schedule templates as the design space,
+  according to a set of predefined "search rules". However, it is non-trivial to extend
+  AutoScheduler to new schedule primitives (tensorize, loop partition, software pipelining, etc).
+* The three systems above have isolated sets of APIs with several layers of their own abstraction,
+  which are not only hard to learn, but also engineering-intensive to customize.
+
+**Benefits of Meta Schedule.**  Meta schedule provides:
+* Succinct syntax, consistent APIs to TensorIR schedule with no other layer of abstraction.
+* Unified APIs for implementing manual schedules, AutoTVM-style schedules, and AutoScheduler-style
+  schedules.
+* Extensibility of all the schedule primitives, including tensorization and loop partitioning.
+  Almost no extra effort is needed to use a new primitive in auto-tuning.
+* The automation infrastructure is extensible on every of its components. Every component of the
+  system can be customized easily in pure python or C++ or both. For example, one can develop a new
+  design space generator in python, a new ProgramRunner in python, etc.
+
+
+## 3. Guide-level explanation
+
+In this section, we describe the syntax of meta schedule DSL, and how it could be used to describe
+and auto-generate the design space.
+
+### 3.1. Manual Schedule
+
+Meta schedule APIs are almost the same as TE or TensorIR scheduling. Here is an example of a manual
+schedule for matrix multiplication:
+
+```python
+# Designate a set of tile sizes
+i_tiles = [16, 8, 8, 8]
+j_tiles = [16, 8, 8, 8]
+k_tiles = [256, 8]
+
+# Tile the loops according to the tile sizes
+i_0, i_1, i_2, i_3 = sch.split(loop=i, factors=i_tiles)
+j_0, j_1, j_2, j_3 = sch.split(loop=j, factors=j_tiles)
+k_0, k_1           = sch.split(loop=k, factors=k_tiles)
+
+# Organize the loops into "SSRSRS" 6-level tiles
+sch.reorder(
+    i_0, j_0, # S: the 1st spatial tile
+    i_1, j_1, # S: the 2nd spatial tile
+    k_0,      # R: the 1st reduction tile
+    i_2, j_2, # S: the 3rd spatial tile
+    k_1,      # R: the 2nd reduction tile
+    i_3, j_3, # S: the 4th spatial tile
+)
+```
+
+In this example, the developers may tweak the tile sizes and measure the performance of the
+generated kernels to explore the opportunities of potential optimization.
+
+Generally speaking, while writing a schedule, there are often some parameters that are hard to
+determine ahead of time, for example, tile sizes, unroll steps, or which tensor intrinsics to use.
+Developers may manually enumerate possible combinations of these unknown factors, and then pick the
+best schedule according to measurement results on their device.
+
+### 3.2. Composite Schedule Rules
+
+As introduced in the previous section, in TensorIR, each schedule primitive handles only a very
+basic transformation of the IR. For example, `split` only splits a loop into two new loops. In the
+real world, the over-fine granularity of those primitives usually leads to repetitive and verbose
+scheduling code, as
+[mentioned](https://discuss.tvm.apache.org/t/rfc-tensorir-a-schedulable-ir-for-tvm/7872/43?u=junrushao1994)
+by developers in our community.
+
+To make it more convenient and modular, we allow users to register "composite schedules" that apply
+a sequence of schedule primitives according to certain analysis of the IR. For instance, a composite
+schedule may inspect a TensorIR block and decide whether we should call `compute_inline` on it.
+
+### 3.3. AutoTVM-style Design Space Description
+
+Meta schedule extends the schedule DSL with sampling instructions. When included in a schedule,
+these instructions parametrize the schedule from a single deterministic point to a space supported
+by random variables (tile size, etc.), making it possible for developers to describe the design
+space with meta schedule APIs.
+
+We can extend the matmul example above to cover all possible tilings using these sampling
+instructions:
+
+```python
+# Sample tile sizes
+i_tiles = sch.sample_perfect_tile(i, n=4)
+j_tiles = sch.sample_perfect_tile(j, n=4)
+k_tiles = sch.sample_perfect_tile(k, n=2)
+# Tile the loops according to the random variables
+i_0, i_1, i_2, i_3 = sch.split(loop=i, factors=i_tiles)
+j_0, j_1, j_2, j_3 = sch.split(loop=j, factors=j_tiles)
+k_0, k_1           = sch.split(loop=k, factors=k_tiles)
+# Organize the loops into "SSRSRS" 6-level tiles
+sch.reorder(
+    i_0, j_0, # S: the 1st spatial tile
+    i_1, j_1, # S: the 2nd spatial tile
+    k_0,      # R: the 1st reduction tile
+    i_2, j_2, # S: the 3rd spatial tile
+    k_1,      # R: the 2nd reduction tile
+    i_3, j_3, # S: the 4th spatial tile
+)
+```
+
+### 3.4. AutoScheduler-style Design Space Generation
+
+AutoScheduler (Ansor) generates schedule templates by applying their SearchRules to each stage.
+SearchRule analyzes TE and eagerly trigger schedule primitives accordingly in its internally
+maintained mini IR.
+
+As introduced in Section 3.2, composite schedule rules are equivalent to AutoScheduler's SearchRule
+in TensorIR scheduling. To further generate a design space for scheduling, sampling instructions are
+used in composite schedule rules. Similarly, the sketch generation phase in AutoScheduler is
+equivalent to applying composite schedule rules to each block in TensorIR.
+
+Several built-in composite schedule rules are shipped with our system to align with Ansor's design
+space:
+
+* Multi-level tiling
+* Inline pure spatial blocks
+* Parallelize & vectorize & unroll
+* Auto tensorize
+
+Developers may implement their own rules in either Python or C++. They may specify which rules to
+use with the following syntax:
+
+```python
+from tvm import meta_schedule as ms
+
+design_space_generator = ms.PostOrderApply(rules=[
+    ms.MultiLevelTiling(...),
+    CustomRule(...),
+    ms.OtherBuiltinRules(...),
+])
+
+```
+
+### 3.5. Unifying manual schedule / AutoTVM / Ansor
+
+In this section, we show that the design space induced by TE manual schedule, AutoTVM and Ansor are
+all subsets of meta schedule, and meta schedule further allows mixing those three styles to search
+jointly.
+
+**Manual schedule**. The TE schedule is a special case of a meta schedule program, where there is no
+randomness introduced by sampling instructions. It is a single point in terms of design space.
+
+**AutoTVM (Template-based tuning)**. It is more natural representation of AutoTVM’s schedule
+templates (knobs) by writing one or more schedule functions in meta schedule with sampling
+instructions. The execution traces generated by the schedule functions are the design space to be
+explored.
+
+**AutoScheduler (Ansor, Template-free tuning)**. As mentioned in the previous section, application
+of composite schedule rules generates the design space, which is equivalent to Ansor’s sketch
+generation.
+
+**Mixing styles in design space definition**. By taking union of the spaces induced by the three
+special cases, our system allows developers to combine generic rules that Ansor provides and
+operator-specific scheduling.
+
+## 4. Reference-level explanation
+
+In this section, we introduce the underlying techniques for the automation system to extract and
+explore the design space. The figure below briefly illustrates the workflow of the system:
+
+![meta-schedule-workflow](../resources/meta-schedule-workflow.png)
+
+### 4.1. Execution trace as the design space
+
+**Trace**. To represent the design space defined by the meta schedule DSL, the underlying system
+records all the instructions users applied to the schedule class, including sampling and schedule
+primitives. We call this list of instructions a trace.
+
+For instance, executing the example above results in the following trace:
+
+```
+Instruction 0. Sample-Perfect-Tile
+Instruction 1. Sample-Perfect-Tile
+Instruction 2. Sample-Perfect-Tile
+Instruction 3. Split
+Instruction 4. Split
+Instruction 5. Split
+Instruction 6. Reorder
+```
+
+**Trace forms design space.** A trace may contain zero or more sampling instructions, which
+introduces the uncertainty in scheduling - one instance of sampling results in one point in the
+design space. Therefore, the trace itself forms a design space to explore, e.g. which set of tile
+sizes works best on a specific hardware.
+
+**Union of design space**. Our system works on a set of traces, representing the union of the design
+spaces represented by every single trace.
+
+**Fork a trace**. When two different decisions in the scheduling process are equally important to
+generate high-performance schedules, we allow forking the trace into two, and the design space is
+the union of the forked traces.
+
+The trace is not strictly user-facing, but can be accessed and printed with the following syntax:
+
+```python
+# requires to trace the execution
+sch = tir.Schedule(..., traced=True)
+# do a lot of scheduling
+...
+# print the trace
+print(sch.trace)
+```
+
+And below is an example of the printed trace, which honestly reflects the schedule as a snippet of
+python scheduling function:
+
+```python
+b0 = sch.get_block(name="matmul", func_name="main")
+l1, l2, l3 = sch.get_loops(block=b0)
+v4, v5, v6, v7 = sch.sample_perfect_tile(loop=l1, n=4, max_innermost_factor=16, decision=[32, 1, 16, 2])
+v8, v9, v10, v11 = sch.sample_perfect_tile(loop=l2, n=4, max_innermost_factor=16, decision=[64, 4, 2, 2])
+v12, v13 = sch.sample_perfect_tile(loop=l3, n=2, max_innermost_factor=16, decision=[64, 16])
+l14, l15, l16, l17 = sch.split(loop=l1, factors=[v4, v5, v6, v7])
+l18, l19, l20, l21 = sch.split(loop=l2, factors=[v8, v9, v10, v11])
+l22, l23 = sch.split(loop=l3, factors=[v12, v13])
+sch.reorder(l14, l18, l15, l19, l22, l16, l20, l23, l17, l21)
+```
+
+### 4.2. Exploring the Design Space
+
+Meta Schedule provides several built-in exploration strategies to exhaustively or efficiently search
+for efficient schedules.
+
+**Random search by replaying schedule functions.** With a user-provided schedule function
+as a black-box design space generator, our system could repetitively invoke such an opaque function
+without doing any extra analysis. The function could be written in C++ or Python, or any language
+that implements packed function FFI. If sampling instructions are present in the function, then each
+invocation results in a different IRModule after being scheduled because the random decisions are
+possibly changed across different runs. Effectively, it is equivalent to
+random exploration without trace, allowing the flexibility for users to define arbitrary functions
+that trace may not well support (e.g. control flow divergence based on the value of intermediate
+random variables), but it forbids future opportunity of any trace-based analysis.
+
+**Random search by replaying traces.** Traces are obtained from a design space generator, and
+replayed with a builtin interpreter in our system. If sampling instructions are present on the
+traces, then their random decisions are mutated during each replay, i.e. jumps to a new point in the
+design space. Therefore, repetitive replay of those traces are equivalent to exploration of the
+design space. Our system could potentially benefit from trace-based analysis, including rejecting
+obviously invalid schedules (e.g. using too much CUDA resources), doing dead-code elimination to
+simplify a trace, extracting trace-based features used in the cost model, etc.
+
+**Cost-model-guided evolutionary search**. A more efficient exploration strategy. We define two sets
+of rules:
+
+* Mutator: defines how to jump to a point’s "neighbor" in the design space
+* Postprocessor: after the trace is executed, there are some extra rules we want to execute, for
+  example:
+  * Check CUDA resource limits: There is a hard requirement in CUDA that the maximum number of
+    threads should not exceed 1024, but it is a random variable that cannot be determined before
+    actually executing the trace. In this case, we write a postprocessor that errors out when the
+    condition is not satisfied.
+  * Fuse outer loops until the extent of the fused loops is large enough: The number of outer loops
+    to be fused together depends on their extents, which are random variables. In this case, we
+    annotate the maximum extent allowed on the block, and do actual fusion in a postprocessor.
+
+Our evolutionary search algorithm uses mutators to find possible schedules in the design space, then
+applies postprocessors, and asks the cost model to predict its performance. After several
+iterations, the new schedules with the highest scores are finally compiled and measured on device.
+Epsilon-greedy is used in this process to balance exploitation and exploration.
+
+### 4.3. Python first for flexibility & customizability
+
+We implement the system in a way that all levels are decoupled and open to customization, aiming at
+providing a playground for developers to try out new ideas and potentially deliver performance
+quickly.
+
+While all the important APIs are implemented in C++ for efficiency, every part of the system can be
+easily switched to customized python implementation. For example,
+
+**Customize design space in python**. Can be a python function that does the schedule
+
+```python
+def schedule_matmul(sch) -> sch:
+    i, j, k = sch.get_loops(sch.get_block("matmul"))
+    i_tiles = sch.sample_perfect_tile(i, n=4)
+    j_tiles = sch.sample_perfect_tile(j, n=4)
+    k_tiles = sch.sample_perfect_tile(k, n=2)
+    # Tile the loops according to the random variables
+    i_0, i_1, i_2, i_3 = sch.split(loop=i, factors=i_tiles)
+    j_0, j_1, j_2, j_3 = sch.split(loop=j, factors=j_tiles)
+    k_0, k_1 = sch.split(loop=k, factors=k_tiles)
+    # Organize the loops into "SSRSRS" 6-level tiles
+    sch.reorder(
+        i_0, j_0, # S
+        i_1, j_1, # S
+        k_0,      # R
+        i_2, j_2, # S
+        k_1,      # R
+        i_3, j_3, # S
+    )
+    return sch
+```
+
+**Customize composite schedule in python**. We provide two ways to define a composite schedule in
+python:
+
+Method 1. Derive from `PyCompositeSchedule`, and implement two methods `initialize` and `apply`:
+
+```python
+class MultiLevelTiling(PyCompositeSchedule):
+    def initialize(...):
+        # initialize the class, usually this method is empty
+        ...
+
+    def apply(sch: Schedule, block: BlockRV) -> Union[Schedule, List[Schedule]]:
+        # write any python code, including:
+        # - analyze `block`
+        # - invoke schedule primitives in `sch`
+        # - do debug printing
+        ...
+```
+
+Method 2. A decorator as the syntactic sugar if the `initialize` method is empty, which converts the
+function to the `apply` method.
+
+```python
+@tir.as_composite_schedule(name="multi-level-tiling")
+def multi_level_tiling(sch: Schedule, block: BlockRV) -> Union[Schedule, List[Schedule]]:
+    # write any python code, including:
+    # - analyze `block`
+    # - invoke schedule primitives in `sch`
+    # - do debug printing
+    ...
+```
+
+**Customize exploration strategies in python**. Developers can implement any search algorithm in
+python as well by deriving from `PySearchPolicy`, and the syntax is identical to customizing with
+`PyCompositeSchedule`.
+
+**Other customizable components**. This list includes:
+
+* Cost model
+* Database
+* Measure callbacks
+* Feature extractor
+* Program builder & runner
+* Analysis methods
+* ...
+
+In a short summary, almost every component of the system is decoupled with each other and extensions
+could be easily plugged in.
+
+## 5. Drawbacks
+
+We are not aware of any drawbacks of the proposed system.
+
+## 6. Rationale and alternatives
+
+The system is designed with the principle of minimalism: different from alternative solutions, we do
+not require any change in existing codebase, or extra APIs to learn. It could potentially lower the
+bar of using automation systems. 
+
+Unifying manual scheduling, AutoTVM's semi automatic templates and AutoScheduler's (Ansor's) fully
+automatic sketch generation provides flexible way to balance injection new domain knowledge and
+automation.
+
+Flexibility in customization allows quick try-out on new tasks, new strategies and new hardware
+targets without deep knowledge of the system.
+
+## 7. Prior art
+
+**Tensor Expression (TE)** in TVM is a DSL that decouples compute and schedule, which provides
+convenient ways to handcraft optimized kernels for different hardware targets.
+
+**TensorIR** is the latest generation of TVM’s low-level IR. Its capability of eagerly applying
+schedule primitives opens the door for meta schedule, our proposed new-generation auto scheduling
+system.
+
+**AutoTVM** is the 1st generation automation framework in TVM, which requires developers to
+implement per-operator scheduling templates, and the system could handle the tuning process.
+
+**AutoScheduler (Ansor)** is the 2nd generation automation framework in TVM, whose built-in rules
+could automatically generate schedule templates for almost all the operators on CPU, GPU, etc.
+
+## 8. Unresolved questions
+
+**Supporting Control Flow and Assertions**
+
+Right now the meta schedule DSL does not support control flow. Although we didn’t see any real-world

Review comment:
       could you give an example of what control flow would look like in the DSL, even if not real-world?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tvm-rfcs] junrushao1994 commented on a change in pull request #5: [RFC] Meta Schedule (AutoTIR)

Posted by GitBox <gi...@apache.org>.
junrushao1994 commented on a change in pull request #5:
URL: https://github.com/apache/tvm-rfcs/pull/5#discussion_r668540347



##########
File path: rfcs/0001-meta-schedule-autotensorir.md
##########
@@ -0,0 +1,444 @@
+<!--- Licensed to the Apache Software Foundation (ASF) under one -->
+<!--- or more contributor license agreements.  See the NOTICE file -->
+<!--- distributed with this work for additional information -->
+<!--- regarding copyright ownership.  The ASF licenses this file -->
+<!--- to you under the Apache License, Version 2.0 (the -->
+<!--- "License"); you may not use this file except in compliance -->
+<!--- with the License.  You may obtain a copy of the License at -->
+
+<!---   http://www.apache.org/licenses/LICENSE-2.0 -->
+
+<!--- Unless required by applicable law or agreed to in writing, -->
+<!--- software distributed under the License is distributed on an -->
+<!--- "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY -->
+<!--- KIND, either express or implied.  See the License for the -->
+<!--- specific language governing permissions and limitations -->
+<!--- under the License. -->
+
+* Feature Name: Meta Schedule (Formerly AutoTIR)
+* Start Date: 2021-05-28
+* RFC PR: TBD (apache/tvm-rfcs#0000)
+* GitHub Issue: TBD (apache/tvm-rfcs#0000)
+
+## 1. Summary
+
+This proposal introduces Meta Schedule: a probabilistic scheduling DSL on TIR that unifies the
+approaches of AutoTVM and Auto Scheduler (Ansor). Meta schedule provides a pragmatic way to define
+the space of automatic tuning, extensibility in terms of all possible TIR schedule primitives like
+tensorization and loop partitioning, and customizability on every layer of the automation system.
+
+Meta Schedule is our 3rd generation automatic scheduling system.
+
+## 2. Motivation
+
+**Scheduling and design space.** In TVM TensorIR, optimization of a TensorIR program is done via a
+sequence of transformations. For example, we reorder loops for better locality and we tensorize for
+specific hardware intrinsics. The process of invoking such a set of pre-defined transformations is
+called "**scheduling**", and each transformation is called a "**schedule primitive**". These
+primitives form a domain-specific language (DSL) describing the transformation of TensorIR programs.
+**Design space** is the set of all possible schedulings with respect to a TensorIR program.
+
+**Problems with the current scheduling system.** Currently we have 3 sets of scheduling APIs:
+* **Manual schedule**: Developers optimize their programs by manually invoking schedule primitives,
+  i.e. explore points in the design space with humans in the loop. This can be a tedious and
+  error-prone approach, hence the creation of AutoTVM and AutoScheduler (Ansor).
+* **AutoTVM**: The automation system requires users to define "schedule templates" as the design
+  space for each operator. Therefore, it is inextensible to hundreds of operators and variety
+  hardware platforms.
+* **AutoScheduler (Ansor)**: It automatically generates schedule templates as the design space,
+  according to a set of predefined "search rules". However, it is non-trivial to extend
+  AutoScheduler to new schedule primitives (tensorize, loop partition, software pipelining, etc).
+* The three systems above have isolated sets of APIs with several layers of their own abstraction,
+  which are not only hard to learn, but also engineering-intensive to customize.
+
+**Benefits of Meta Schedule.**  Meta schedule provides:

Review comment:
       I added a paragraph to reveal the problem of the existing AutoTVM and Ansor tuning




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tvm-rfcs] junrushao1994 commented on a change in pull request #5: [RFC] Meta Schedule (AutoTIR)

Posted by GitBox <gi...@apache.org>.
junrushao1994 commented on a change in pull request #5:
URL: https://github.com/apache/tvm-rfcs/pull/5#discussion_r668524060



##########
File path: rfcs/0001-meta-schedule-autotensorir.md
##########
@@ -0,0 +1,339 @@
+<!--- Licensed to the Apache Software Foundation (ASF) under one -->
+<!--- or more contributor license agreements.  See the NOTICE file -->
+<!--- distributed with this work for additional information -->
+<!--- regarding copyright ownership.  The ASF licenses this file -->
+<!--- to you under the Apache License, Version 2.0 (the -->
+<!--- "License"); you may not use this file except in compliance -->
+<!--- with the License.  You may obtain a copy of the License at -->
+
+<!---   http://www.apache.org/licenses/LICENSE-2.0 -->
+
+<!--- Unless required by applicable law or agreed to in writing, -->
+<!--- software distributed under the License is distributed on an -->
+<!--- "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY -->
+<!--- KIND, either express or implied.  See the License for the -->
+<!--- specific language governing permissions and limitations -->
+<!--- under the License. -->
+
+* Feature Name: Meta Schedule (AutoTIR)
+* Start Date: 2021-05-28
+* RFC PR: TBD (apache/tvm-rfcs#0000)
+* GitHub Issue: TBD (apache/tvm-rfcs#0000)
+
+## 1. Summary
+
+This proposal introduces Meta Schedule: a probabilistic scheduling DSL on TIR that unifies the approaches of AutoTVM and Auto Scheduler (Ansor). Meta schedule provides a pragmatic way to define the space of automatic tuning, extensibility in terms of all possible TIR schedule primitives like tensorization and loop partitioning, and customizability on every layer of the automation system.
+
+Meta Schedule is our 3rd generation automatic scheduling system.
+
+## 2. Motivation
+
+**Scheduling and Design Space**
+
+In TVM TensorIR, optimization of a TensorIR program is done via a sequence of transformations. For example, we reorder loops for better locality and we tensorize for specific hardware intrinsics. The process of invoking such a set of pre-defined transformations is called "**scheduling**", and each transformation is called a "**schedule primitive**". These primitives form a domain-specific language (DSL) describing the transformation of TensorIR programs. **Design space** is the set of all possible schedulings with respect to a TensorIR program.
+
+**Problems with the Current Scheduling System**
+
+* **Manual schedule**: Developers optimize their programs by manually invoking schedule primitives, i.e. explore points in the design space with humans in the loop. This can be a tedious and error-prone approach, hence the creation of AutoTVM and AutoScheduler (Ansor).
+* **AutoTVM**: The automation system requires users to define "schedule templates" as the design space for each operator. Therefore, it is inextensible to hundreds of operators and variety hardware platforms.
+* **AutoScheduler (Ansor)**: It automatically generates schedule templates as the design space, according to a set of predefined "search rules". However, it is non-trivial to extend AutoScheduler to new schedule primitives (tensorize, loop partition, software pipelining, etc).
+* The three systems above have isolated sets of APIs with several layers of their own abstraction, which are not only hard to learn, but also engineering-intensive to customize.
+
+**Benefit of Meta Schedule**
+
+* Succinct syntax, consistent APIs to TensorIR schedule with no other layer of abstraction.
+* Provides unified APIs for implementing manual schedule, AutoTVM and AutoScheduler (Ansor).
+* Extensibility to all the schedule primitives, including tensorization and loop partitioning. Almost no extra effort is needed to use a new primitive in auto-tuning.
+* The automation infrastructure is extensible on every of its components. Every component of the system can be customized easily in pure python or C++ or both. For example, one can develop a new design space generator in python, a new ProgramRunner in python, etc.
+
+
+## 3. Guide-level explanation
+
+In this section, we describe the syntax of meta schedule DSL, and how it could be used to describe and auto-generate the design space.
+
+### 3.1. Manual Schedule
+
+Meta schedule APIs are almost the same as TE or TensorIR scheduling. Here is an example of a manual schedule for matrix multiplication:
+
+```python
+# Designate a set of tile sizes
+i_tiles = [16, 8, 8, 8]
+j_tiles = [16, 8, 8, 8]
+k_tiles = [256, 8]
+
+# Tile the loops according to the tile sizes
+i_0, i_1, i_2, i_3 = sch.split(loop=i, factors=i_tiles)
+j_0, j_1, j_2, j_3 = sch.split(loop=j, factors=j_tiles)
+k_0, k_1           = sch.split(loop=k, factors=k_tiles)
+
+# Organize the loops into "SSRSRS" 6-level tiles
+sch.reorder(
+    i_0, j_0, # S: the 1st spatial tile
+    i_1, j_1, # S: the 2nd spatial tile
+    k_0,      # R: the 1st reduction tile
+    i_2, j_2, # S: the 3rd spatial tile
+    k_1,      # R: the 2nd reduction tile
+    i_3, j_3, # S: the 4th spatial tile
+)
+```
+
+In this example, the developers may tweak the tile sizes and measure the performance of the generated kernels to explore the opportunities of potential optimization.
+
+Generally speaking, while writing a schedule, there are often some parameters that are hard to determine ahead of time, for example, tile sizes, unroll steps, or which tensor intrinsics to use. Developers may manually enumerate possible combinations of these unknown factors, and then pick the best schedule according to measurement results on their device.
+
+### 3.2. Composite Schedule Rules
+
+As introduced in the previous section, in TensorIR, each schedule primitive handles only a very basic transformation of the IR, for example, `split` only splits a loop into two. In the real world, the over-fine granularity of those primitives usually leads to repetitive and verbose scheduling code, as [mentioned](https://discuss.tvm.apache.org/t/rfc-tensorir-a-schedulable-ir-for-tvm/7872/43?u=junrushao1994) by developers in our community.
+
+To make it more convenient and modular, we allow users to register "composite schedules" that apply a sequence of schedule primitives according to certain analysis of the IR. For instance, a composite schedule may inspect a TensorIR block and decide whether we should call `compute_inline` on it.
+
+### 3.3. AutoTVM-style Design Space Description
+
+Meta schedule extends the schedule DSL with sampling instructions. When included in a schedule, these instructions parametrize the schedule from a single deterministic point to a space supported by random variables (tile size, etc.), making it possible for developers to describe the design space with meta schedule APIs.
+
+We can extend the matmul example above to cover all possible tilings using these sampling instructions:
+
+```python
+# Sample tile sizes
+i_tiles = sch.sample_perfect_tile(i, n=4)
+j_tiles = sch.sample_perfect_tile(j, n=4)
+k_tiles = sch.sample_perfect_tile(k, n=2)
+# Tile the loops according to the random variables
+i_0, i_1, i_2, i_3 = sch.split(loop=i, factors=i_tiles)
+j_0, j_1, j_2, j_3 = sch.split(loop=j, factors=j_tiles)
+k_0, k_1           = sch.split(loop=k, factors=k_tiles)
+# Organize the loops into "SSRSRS" 6-level tiles
+sch.reorder(
+    i_0, j_0, # S: the 1st spatial tile
+    i_1, j_1, # S: the 2nd spatial tile
+    k_0,      # R: the 1st reduction tile
+    i_2, j_2, # S: the 3rd spatial tile
+    k_1,      # R: the 2nd reduction tile
+    i_3, j_3, # S: the 4th spatial tile
+)
+```
+
+### 3.4. AutoScheduler-style Design Space Generation
+
+AutoScheduler (Ansor) generates schedule templates by applying their SearchRules to each stage. SearchRule analyzes TE and eagerly trigger schedule primitives accordingly in its internally maintained mini IR.
+
+As introduced in Section 3.2, composite schedule rules are equivalent to AutoScheduler's SearchRule in TensorIR scheduling. To further generate a design space for scheduling, sampling instructions are used in composite schedule rules. Similarly, the sketch generation phase in AutoScheduler is equivalent to applying composite schedule rules to each block in TensorIR.
+
+Several built-in composite schedule rules are shipped with our system to align with Ansor's design space:
+
+* Multi-level tiling
+* Inline pure spatial blocks
+* Parallelize & vectorize & unroll
+* Auto tensorize
+* ...
+
+Developers may implement their own rules in either Python or C++, and specify which rules to use with the following syntax:
+
+```python
+from tvm import meta_schedule as ms
+
+design_space_generator = ms.PostOrderApply(rules=[
+    ms.MultiLevelTiling(...),
+    CustomRule(...),
+    ms.OtherBuiltinRules(...),
+])
+
+```
+
+### 3.5. Unifying manual schedule / AutoTVM / Ansor
+
+In this section, we show that the design space induced by TE manual schedule, AutoTVM and Ansor are all subsets of meta schedule, and meta schedule further allows mixing those three styles to search jointly.
+
+**Manual schedule**. The TE schedule is a special case of a meta schedule program, where there is no randomness introduced by sampling instructions. It is a single point in terms of design space.
+
+**AutoTVM (Template-based tuning)**. It is more natural representation of AutoTVM’s schedule templates (knobs) by writing one or more schedule functions in meta schedule with sampling instructions. The execution traces generated by the schedule functions are the design space to be explored.
+
+**AutoScheduler (Ansor, Template-free tuning)**. As mentioned in the previous section, application of composite schedule rules generates the design space, which is equivalent to Ansor’s sketch generation.
+
+**Mixing styles in design space definition**. By taking union of the spaces induced by the three special cases, our system allows developers to combine generic rules that Ansor provides and operator-specific scheduling.
+
+## 4. Reference-level explanation
+
+In this section, we introduce the underlying techniques for the automation system to extract and explore the design space. The figure below briefly illustrates the workflow of the system:
+
+![meta-schedule-workflow](../resources/meta-schedule-workflow.png)
+
+### 4.1. Execution trace as the design space
+
+**Trace**. To represent the design space defined by the meta schedule DSL, the underlying system records all the instructions users applied to the schedule class, including sampling and schedule primitives. We call this list of instructions a trace.
+
+Executing the example above results in the following trace:
+
+```
+Instruction 0. Sample-Perfect-Tile
+Instruction 1. Sample-Perfect-Tile
+Instruction 2. Sample-Perfect-Tile
+Instruction 3. Split
+Instruction 4. Split
+Instruction 5. Split
+Instruction 6. Reorder
+```
+
+**Trace forms design space.** A trace may contain zero or more sampling instructions, which introduces the uncertainty in scheduling - one instance of sampling results in one point in the design space. Therefore, the trace itself forms a design space to explore, e.g. which set of tile sizes works best on a specific hardware.
+
+**Union of design space**. Our system works on a set of traces, representing the union of the design spaces represented by every single trace.
+
+**Fork a trace**. When two different decisions in the scheduling process are equally important to generate high-performance schedules, we allow forking the trace into two, and the design space is the union of the forked traces.
+
+The trace is not strictly user-facing, but can be accessed and printed with the following syntax:
+
+```python
+# requires to trace the execution
+sch = tir.Schedule(..., traced=True)
+# do a lot of scheduling
+...
+# print the trace
+print(sch.trace)
+```
+
+And below is an example of the printed trace, which honestly reflects the schedule as a snippet of python scheduling function:
+
+```python
+b0 = sch.get_block(name="matmul", func_name="main")
+l1, l2, l3 = sch.get_loops(block=b0)
+v4, v5, v6, v7 = sch.sample_perfect_tile(loop=l1, n=4, max_innermost_factor=16, decision=[32, 1, 16, 2])
+v8, v9, v10, v11 = sch.sample_perfect_tile(loop=l2, n=4, max_innermost_factor=16, decision=[64, 4, 2, 2])
+v12, v13 = sch.sample_perfect_tile(loop=l3, n=2, max_innermost_factor=16, decision=[64, 16])
+l14, l15, l16, l17 = sch.split(loop=l1, factors=[v4, v5, v6, v7])
+l18, l19, l20, l21 = sch.split(loop=l2, factors=[v8, v9, v10, v11])
+l22, l23 = sch.split(loop=l3, factors=[v12, v13])
+sch.reorder(l14, l18, l15, l19, l22, l16, l20, l23, l17, l21)
+```
+
+### 4.2. Exploring the Search Space
+
+Meta Schedule provides several built-in exploration strategies to exhaustively or efficiently search for efficient schedules.
+
+**Program replay**. A simple strategy that replays the user-provided schedule function without using or taking any advantage of trace.
+
+**Random search**. Extracts the traces as the design space from any design space generator (e.g. user-provided schedule function, composite schedule rules applied to each block, or any custom space generator), repetitively mutates the random decisions of a random trace and re-executes the traces.
+
+**Cost-model-guided evolutionary search**. A more efficient exploration strategy. We define two sets of rules:
+
+* Mutator: defines how to jump to a point’s "neighbor" in the design space
+* Postprocessor: after the trace is executed, there are some extra rules we want to execute, for example:
+  * Check CUDA resource limits: There is a hard requirement in CUDA that the maximum number of threads should not exceed 1024, but it is a random variable that cannot be determined before actually executing the trace. In this case, we write a postprocessor that errors out when the condition is not satisfied.
+  * Fuse outer loops until the extent of the fused loops is large enough: The number of outer loops to be fused together depends on their extents, which are random variables. In this case, we annotate the maximum extent allowed on the block, and do actual fusion in a postprocessor.
+
+Our evolutionary search algorithm uses mutators to find possible schedules in the design space, then applies postprocessors, and asks the cost model to predict its performance. After several iterations, the new schedules with the highest scores are finally compiled and measured on device. Epsilon-greedy is used in this process to balance exploitation and exploration.
+
+### 4.3. Python first for flexibility & customizability
+
+We implement the system in a way that all levels are decoupled and open to customization, aiming at providing a playground for developers to try out new ideas and potentially deliver performance quickly.
+
+While all the important APIs are implemented in C++ for efficiency, every part of the system can be easily switched to customized python implementation. For example,
+
+**Customize design space in python**. Can be a python function that does the schedule
+
+```python
+def schedule_matmul(sch) -> sch:
+    i, j, k = sch.get_loops(sch.get_block("matmul"))
+    i_tiles = sch.sample_perfect_tile(i, n=4)
+    j_tiles = sch.sample_perfect_tile(j, n=4)
+    k_tiles = sch.sample_perfect_tile(k, n=2)
+    # Tile the loops according to the random variables
+    i_0, i_1, i_2, i_3 = sch.split(loop=i, factors=i_tiles)
+    j_0, j_1, j_2, j_3 = sch.split(loop=j, factors=j_tiles)
+    k_0, k_1 = sch.split(loop=k, factors=k_tiles)
+    # Organize the loops into "SSRSRS" 6-level tiles
+    sch.reorder(
+        i_0, j_0, # S
+        i_1, j_1, # S
+        k_0,      # R
+        i_2, j_2, # S
+        k_1,      # R
+        i_3, j_3, # S
+    )
+    return sch
+```
+
+**Customize composite schedule in python**. We provide two ways to define a composite schedule in python:
+
+Method 1. Derive from `PyCompositeSchedule`, and implement two methods `initialize` and `apply`:
+
+```python
+class MultiLevelTiling(PyCompositeSchedule):
+    def initialize(...):
+        # initialize the class, usually this method is empty
+        ...
+
+    def apply(sch: Schedule, block: BlockRV) -> Union[Schedule, List[Schedule]]:
+        # write any python code, including:
+        # - analyze `block`
+        # - invoke schedule primitives in `sch`
+        # - do debug printing
+        ...
+```
+
+Method 2. A decorator as the syntactic sugar if the `initialize` method is empty, which converts the function to the `apply` method.
+
+```python
+@tir.as_composite_schedule(name="multi-level-tiling")
+def multi_level_tiling(sch: Schedule, block: BlockRV) -> Union[Schedule, List[Schedule]]:
+    # write any python code, including:
+    # - analyze `block`
+    # - invoke schedule primitives in `sch`
+    # - do debug printing
+    ...
+```
+
+**Customize exploration strategies in python**. Developers can implement any search algorithm in python as well by deriving from `PySearchPolicy`, and the syntax is identical to customizing with `PyCompositeSchedule`.
+
+**Other customizable components**. This list includes:
+
+* Cost model
+* Database
+* Measure callbacks
+* Feature extractor
+* Program builder & runner
+* Analysis methods
+* ...
+
+In a short summary, almost every component of the system is decoupled with each other and extensions could be easily plugged in.
+
+## 5. Drawbacks
+
+We are not aware of any drawbacks of the proposed system.

Review comment:
       Another drawback is that we will need to migrate TE schedules in TOPI to TensorIR schedules (remove most of them actually)




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tvm-rfcs] junrushao1994 commented on a change in pull request #5: [RFC] Meta Schedule (AutoTIR)

Posted by GitBox <gi...@apache.org>.
junrushao1994 commented on a change in pull request #5:
URL: https://github.com/apache/tvm-rfcs/pull/5#discussion_r664876764



##########
File path: rfcs/0001-meta-schedule-autotensorir.md
##########
@@ -0,0 +1,339 @@
+<!--- Licensed to the Apache Software Foundation (ASF) under one -->
+<!--- or more contributor license agreements.  See the NOTICE file -->
+<!--- distributed with this work for additional information -->
+<!--- regarding copyright ownership.  The ASF licenses this file -->
+<!--- to you under the Apache License, Version 2.0 (the -->
+<!--- "License"); you may not use this file except in compliance -->
+<!--- with the License.  You may obtain a copy of the License at -->
+
+<!---   http://www.apache.org/licenses/LICENSE-2.0 -->
+
+<!--- Unless required by applicable law or agreed to in writing, -->
+<!--- software distributed under the License is distributed on an -->
+<!--- "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY -->
+<!--- KIND, either express or implied.  See the License for the -->
+<!--- specific language governing permissions and limitations -->
+<!--- under the License. -->
+
+* Feature Name: Meta Schedule (AutoTIR)
+* Start Date: 2021-05-28
+* RFC PR: TBD (apache/tvm-rfcs#0000)
+* GitHub Issue: TBD (apache/tvm-rfcs#0000)
+
+## 1. Summary
+
+This proposal introduces Meta Schedule: a probabilistic scheduling DSL on TIR that unifies the approaches of AutoTVM and Auto Scheduler (Ansor). Meta schedule provides a pragmatic way to define the space of automatic tuning, extensibility in terms of all possible TIR schedule primitives like tensorization and loop partitioning, and customizability on every layer of the automation system.
+
+Meta Schedule is our 3rd generation automatic scheduling system.
+
+## 2. Motivation
+
+**Scheduling and Design Space**
+
+In TVM TensorIR, optimization of a TensorIR program is done via a sequence of transformations. For example, we reorder loops for better locality and we tensorize for specific hardware intrinsics. The process of invoking such a set of pre-defined transformations is called "**scheduling**", and each transformation is called a "**schedule primitive**". These primitives form a domain-specific language (DSL) describing the transformation of TensorIR programs. **Design space** is the set of all possible schedulings with respect to a TensorIR program.
+
+**Problems with the Current Scheduling System**
+
+* **Manual schedule**: Developers optimize their programs by manually invoking schedule primitives, i.e. explore points in the design space with humans in the loop. This can be a tedious and error-prone approach, hence the creation of AutoTVM and AutoScheduler (Ansor).
+* **AutoTVM**: The automation system requires users to define "schedule templates" as the design space for each operator. Therefore, it is inextensible to hundreds of operators and variety hardware platforms.
+* **AutoScheduler (Ansor)**: It automatically generates schedule templates as the design space, according to a set of predefined "search rules". However, it is non-trivial to extend AutoScheduler to new schedule primitives (tensorize, loop partition, software pipelining, etc).
+* The three systems above have isolated sets of APIs with several layers of their own abstraction, which are not only hard to learn, but also engineering-intensive to customize.
+
+**Benefit of Meta Schedule**
+
+* Succinct syntax, consistent APIs to TensorIR schedule with no other layer of abstraction.
+* Provides unified APIs for implementing manual schedule, AutoTVM and AutoScheduler (Ansor).
+* Extensibility to all the schedule primitives, including tensorization and loop partitioning. Almost no extra effort is needed to use a new primitive in auto-tuning.
+* The automation infrastructure is extensible on every of its components. Every component of the system can be customized easily in pure python or C++ or both. For example, one can develop a new design space generator in python, a new ProgramRunner in python, etc.
+
+
+## 3. Guide-level explanation
+
+In this section, we describe the syntax of meta schedule DSL, and how it could be used to describe and auto-generate the design space.
+
+### 3.1. Manual Schedule
+
+Meta schedule APIs are almost the same as TE or TensorIR scheduling. Here is an example of a manual schedule for matrix multiplication:
+
+```python
+# Designate a set of tile sizes
+i_tiles = [16, 8, 8, 8]
+j_tiles = [16, 8, 8, 8]
+k_tiles = [256, 8]
+
+# Tile the loops according to the tile sizes
+i_0, i_1, i_2, i_3 = sch.split(loop=i, factors=i_tiles)
+j_0, j_1, j_2, j_3 = sch.split(loop=j, factors=j_tiles)
+k_0, k_1           = sch.split(loop=k, factors=k_tiles)
+
+# Organize the loops into "SSRSRS" 6-level tiles
+sch.reorder(
+    i_0, j_0, # S: the 1st spatial tile
+    i_1, j_1, # S: the 2nd spatial tile
+    k_0,      # R: the 1st reduction tile
+    i_2, j_2, # S: the 3rd spatial tile
+    k_1,      # R: the 2nd reduction tile
+    i_3, j_3, # S: the 4th spatial tile
+)
+```
+
+In this example, the developers may tweak the tile sizes and measure the performance of the generated kernels to explore the opportunities of potential optimization.
+
+Generally speaking, while writing a schedule, there are often some parameters that are hard to determine ahead of time, for example, tile sizes, unroll steps, or which tensor intrinsics to use. Developers may manually enumerate possible combinations of these unknown factors, and then pick the best schedule according to measurement results on their device.
+
+### 3.2. Composite Schedule Rules
+
+As introduced in the previous section, in TensorIR, each schedule primitive handles only a very basic transformation of the IR, for example, `split` only splits a loop into two. In the real world, the over-fine granularity of those primitives usually leads to repetitive and verbose scheduling code, as [mentioned](https://discuss.tvm.apache.org/t/rfc-tensorir-a-schedulable-ir-for-tvm/7872/43?u=junrushao1994) by developers in our community.
+
+To make it more convenient and modular, we allow users to register "composite schedules" that apply a sequence of schedule primitives according to certain analysis of the IR. For instance, a composite schedule may inspect a TensorIR block and decide whether we should call `compute_inline` on it.
+
+### 3.3. AutoTVM-style Design Space Description
+
+Meta schedule extends the schedule DSL with sampling instructions. When included in a schedule, these instructions parametrize the schedule from a single deterministic point to a space supported by random variables (tile size, etc.), making it possible for developers to describe the design space with meta schedule APIs.
+
+We can extend the matmul example above to cover all possible tilings using these sampling instructions:
+
+```python
+# Sample tile sizes
+i_tiles = sch.sample_perfect_tile(i, n=4)
+j_tiles = sch.sample_perfect_tile(j, n=4)
+k_tiles = sch.sample_perfect_tile(k, n=2)
+# Tile the loops according to the random variables
+i_0, i_1, i_2, i_3 = sch.split(loop=i, factors=i_tiles)
+j_0, j_1, j_2, j_3 = sch.split(loop=j, factors=j_tiles)
+k_0, k_1           = sch.split(loop=k, factors=k_tiles)
+# Organize the loops into "SSRSRS" 6-level tiles
+sch.reorder(
+    i_0, j_0, # S: the 1st spatial tile
+    i_1, j_1, # S: the 2nd spatial tile
+    k_0,      # R: the 1st reduction tile
+    i_2, j_2, # S: the 3rd spatial tile
+    k_1,      # R: the 2nd reduction tile
+    i_3, j_3, # S: the 4th spatial tile
+)
+```
+
+### 3.4. AutoScheduler-style Design Space Generation
+
+AutoScheduler (Ansor) generates schedule templates by applying their SearchRules to each stage. SearchRule analyzes TE and eagerly trigger schedule primitives accordingly in its internally maintained mini IR.
+
+As introduced in Section 3.2, composite schedule rules are equivalent to AutoScheduler's SearchRule in TensorIR scheduling. To further generate a design space for scheduling, sampling instructions are used in composite schedule rules. Similarly, the sketch generation phase in AutoScheduler is equivalent to applying composite schedule rules to each block in TensorIR.
+
+Several built-in composite schedule rules are shipped with our system to align with Ansor's design space:
+
+* Multi-level tiling
+* Inline pure spatial blocks
+* Parallelize & vectorize & unroll
+* Auto tensorize
+* ...
+
+Developers may implement their own rules in either Python or C++, and specify which rules to use with the following syntax:
+
+```python
+from tvm import meta_schedule as ms
+
+design_space_generator = ms.PostOrderApply(rules=[
+    ms.MultiLevelTiling(...),
+    CustomRule(...),
+    ms.OtherBuiltinRules(...),
+])
+
+```
+
+### 3.5. Unifying manual schedule / AutoTVM / Ansor
+
+In this section, we show that the design space induced by TE manual schedule, AutoTVM and Ansor are all subsets of meta schedule, and meta schedule further allows mixing those three styles to search jointly.
+
+**Manual schedule**. The TE schedule is a special case of a meta schedule program, where there is no randomness introduced by sampling instructions. It is a single point in terms of design space.
+
+**AutoTVM (Template-based tuning)**. It is more natural representation of AutoTVM’s schedule templates (knobs) by writing one or more schedule functions in meta schedule with sampling instructions. The execution traces generated by the schedule functions are the design space to be explored.
+
+**AutoScheduler (Ansor, Template-free tuning)**. As mentioned in the previous section, application of composite schedule rules generates the design space, which is equivalent to Ansor’s sketch generation.
+
+**Mixing styles in design space definition**. By taking union of the spaces induced by the three special cases, our system allows developers to combine generic rules that Ansor provides and operator-specific scheduling.
+
+## 4. Reference-level explanation
+
+In this section, we introduce the underlying techniques for the automation system to extract and explore the design space. The figure below briefly illustrates the workflow of the system:
+
+![meta-schedule-workflow](../resources/meta-schedule-workflow.png)
+
+### 4.1. Execution trace as the design space
+
+**Trace**. To represent the design space defined by the meta schedule DSL, the underlying system records all the instructions users applied to the schedule class, including sampling and schedule primitives. We call this list of instructions a trace.
+
+Executing the example above results in the following trace:
+
+```
+Instruction 0. Sample-Perfect-Tile
+Instruction 1. Sample-Perfect-Tile
+Instruction 2. Sample-Perfect-Tile
+Instruction 3. Split
+Instruction 4. Split
+Instruction 5. Split
+Instruction 6. Reorder
+```
+
+**Trace forms design space.** A trace may contain zero or more sampling instructions, which introduces the uncertainty in scheduling - one instance of sampling results in one point in the design space. Therefore, the trace itself forms a design space to explore, e.g. which set of tile sizes works best on a specific hardware.
+
+**Union of design space**. Our system works on a set of traces, representing the union of the design spaces represented by every single trace.
+
+**Fork a trace**. When two different decisions in the scheduling process are equally important to generate high-performance schedules, we allow forking the trace into two, and the design space is the union of the forked traces.
+
+The trace is not strictly user-facing, but can be accessed and printed with the following syntax:
+
+```python
+# requires to trace the execution
+sch = tir.Schedule(..., traced=True)
+# do a lot of scheduling
+...
+# print the trace
+print(sch.trace)
+```
+
+And below is an example of the printed trace, which honestly reflects the schedule as a snippet of python scheduling function:
+
+```python
+b0 = sch.get_block(name="matmul", func_name="main")
+l1, l2, l3 = sch.get_loops(block=b0)
+v4, v5, v6, v7 = sch.sample_perfect_tile(loop=l1, n=4, max_innermost_factor=16, decision=[32, 1, 16, 2])
+v8, v9, v10, v11 = sch.sample_perfect_tile(loop=l2, n=4, max_innermost_factor=16, decision=[64, 4, 2, 2])
+v12, v13 = sch.sample_perfect_tile(loop=l3, n=2, max_innermost_factor=16, decision=[64, 16])
+l14, l15, l16, l17 = sch.split(loop=l1, factors=[v4, v5, v6, v7])
+l18, l19, l20, l21 = sch.split(loop=l2, factors=[v8, v9, v10, v11])
+l22, l23 = sch.split(loop=l3, factors=[v12, v13])
+sch.reorder(l14, l18, l15, l19, l22, l16, l20, l23, l17, l21)
+```
+
+### 4.2. Exploring the Search Space
+
+Meta Schedule provides several built-in exploration strategies to exhaustively or efficiently search for efficient schedules.
+
+**Program replay**. A simple strategy that replays the user-provided schedule function without using or taking any advantage of trace.
+
+**Random search**. Extracts the traces as the design space from any design space generator (e.g. user-provided schedule function, composite schedule rules applied to each block, or any custom space generator), repetitively mutates the random decisions of a random trace and re-executes the traces.
+
+**Cost-model-guided evolutionary search**. A more efficient exploration strategy. We define two sets of rules:
+
+* Mutator: defines how to jump to a point’s "neighbor" in the design space
+* Postprocessor: after the trace is executed, there are some extra rules we want to execute, for example:
+  * Check CUDA resource limits: There is a hard requirement in CUDA that the maximum number of threads should not exceed 1024, but it is a random variable that cannot be determined before actually executing the trace. In this case, we write a postprocessor that errors out when the condition is not satisfied.
+  * Fuse outer loops until the extent of the fused loops is large enough: The number of outer loops to be fused together depends on their extents, which are random variables. In this case, we annotate the maximum extent allowed on the block, and do actual fusion in a postprocessor.
+
+Our evolutionary search algorithm uses mutators to find possible schedules in the design space, then applies postprocessors, and asks the cost model to predict its performance. After several iterations, the new schedules with the highest scores are finally compiled and measured on device. Epsilon-greedy is used in this process to balance exploitation and exploration.
+
+### 4.3. Python first for flexibility & customizability
+
+We implement the system in a way that all levels are decoupled and open to customization, aiming at providing a playground for developers to try out new ideas and potentially deliver performance quickly.
+
+While all the important APIs are implemented in C++ for efficiency, every part of the system can be easily switched to customized python implementation. For example,
+
+**Customize design space in python**. Can be a python function that does the schedule
+
+```python
+def schedule_matmul(sch) -> sch:
+    i, j, k = sch.get_loops(sch.get_block("matmul"))
+    i_tiles = sch.sample_perfect_tile(i, n=4)
+    j_tiles = sch.sample_perfect_tile(j, n=4)
+    k_tiles = sch.sample_perfect_tile(k, n=2)
+    # Tile the loops according to the random variables
+    i_0, i_1, i_2, i_3 = sch.split(loop=i, factors=i_tiles)
+    j_0, j_1, j_2, j_3 = sch.split(loop=j, factors=j_tiles)
+    k_0, k_1 = sch.split(loop=k, factors=k_tiles)
+    # Organize the loops into "SSRSRS" 6-level tiles
+    sch.reorder(
+        i_0, j_0, # S
+        i_1, j_1, # S
+        k_0,      # R
+        i_2, j_2, # S
+        k_1,      # R
+        i_3, j_3, # S
+    )
+    return sch
+```
+
+**Customize composite schedule in python**. We provide two ways to define a composite schedule in python:
+
+Method 1. Derive from `PyCompositeSchedule`, and implement two methods `initialize` and `apply`:
+
+```python
+class MultiLevelTiling(PyCompositeSchedule):
+    def initialize(...):
+        # initialize the class, usually this method is empty
+        ...
+
+    def apply(sch: Schedule, block: BlockRV) -> Union[Schedule, List[Schedule]]:
+        # write any python code, including:
+        # - analyze `block`
+        # - invoke schedule primitives in `sch`
+        # - do debug printing
+        ...
+```
+
+Method 2. A decorator as the syntactic sugar if the `initialize` method is empty, which converts the function to the `apply` method.
+
+```python
+@tir.as_composite_schedule(name="multi-level-tiling")
+def multi_level_tiling(sch: Schedule, block: BlockRV) -> Union[Schedule, List[Schedule]]:
+    # write any python code, including:
+    # - analyze `block`
+    # - invoke schedule primitives in `sch`
+    # - do debug printing
+    ...
+```
+
+**Customize exploration strategies in python**. Developers can implement any search algorithm in python as well by deriving from `PySearchPolicy`, and the syntax is identical to customizing with `PyCompositeSchedule`.
+
+**Other customizable components**. This list includes:
+
+* Cost model
+* Database
+* Measure callbacks
+* Feature extractor
+* Program builder & runner
+* Analysis methods
+* ...
+
+In a short summary, almost every component of the system is decoupled with each other and extensions could be easily plugged in.
+
+## 5. Drawbacks
+
+We are not aware of any drawbacks of the proposed system.

Review comment:
       Most of the schedule templates will be generated by the schedule rules, so i could imagine that we can remove most of the schedules in the future




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tvm-rfcs] junrushao1994 commented on a change in pull request #5: [RFC] Meta Schedule (AutoTIR)

Posted by GitBox <gi...@apache.org>.
junrushao1994 commented on a change in pull request #5:
URL: https://github.com/apache/tvm-rfcs/pull/5#discussion_r668325916



##########
File path: rfcs/0001-meta-schedule-autotensorir.md
##########
@@ -0,0 +1,444 @@
+<!--- Licensed to the Apache Software Foundation (ASF) under one -->
+<!--- or more contributor license agreements.  See the NOTICE file -->
+<!--- distributed with this work for additional information -->
+<!--- regarding copyright ownership.  The ASF licenses this file -->
+<!--- to you under the Apache License, Version 2.0 (the -->
+<!--- "License"); you may not use this file except in compliance -->
+<!--- with the License.  You may obtain a copy of the License at -->
+
+<!---   http://www.apache.org/licenses/LICENSE-2.0 -->
+
+<!--- Unless required by applicable law or agreed to in writing, -->
+<!--- software distributed under the License is distributed on an -->
+<!--- "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY -->
+<!--- KIND, either express or implied.  See the License for the -->
+<!--- specific language governing permissions and limitations -->
+<!--- under the License. -->
+
+* Feature Name: Meta Schedule (Formerly AutoTIR)
+* Start Date: 2021-05-28
+* RFC PR: TBD (apache/tvm-rfcs#0000)
+* GitHub Issue: TBD (apache/tvm-rfcs#0000)
+
+## 1. Summary
+
+This proposal introduces Meta Schedule: a probabilistic scheduling DSL on TIR that unifies the
+approaches of AutoTVM and Auto Scheduler (Ansor). Meta schedule provides a pragmatic way to define
+the space of automatic tuning, extensibility in terms of all possible TIR schedule primitives like
+tensorization and loop partitioning, and customizability on every layer of the automation system.
+
+Meta Schedule is our 3rd generation automatic scheduling system.
+
+## 2. Motivation
+
+**Scheduling and design space.** In TVM TensorIR, optimization of a TensorIR program is done via a
+sequence of transformations. For example, we reorder loops for better locality and we tensorize for
+specific hardware intrinsics. The process of invoking such a set of pre-defined transformations is
+called "**scheduling**", and each transformation is called a "**schedule primitive**". These
+primitives form a domain-specific language (DSL) describing the transformation of TensorIR programs.
+**Design space** is the set of all possible schedulings with respect to a TensorIR program.
+
+**Problems with the current scheduling system.** Currently we have 3 sets of scheduling APIs:
+* **Manual schedule**: Developers optimize their programs by manually invoking schedule primitives,
+  i.e. explore points in the design space with humans in the loop. This can be a tedious and
+  error-prone approach, hence the creation of AutoTVM and AutoScheduler (Ansor).
+* **AutoTVM**: The automation system requires users to define "schedule templates" as the design
+  space for each operator. Therefore, it is inextensible to hundreds of operators and variety
+  hardware platforms.
+* **AutoScheduler (Ansor)**: It automatically generates schedule templates as the design space,
+  according to a set of predefined "search rules". However, it is non-trivial to extend
+  AutoScheduler to new schedule primitives (tensorize, loop partition, software pipelining, etc).
+* The three systems above have isolated sets of APIs with several layers of their own abstraction,
+  which are not only hard to learn, but also engineering-intensive to customize.
+
+**Benefits of Meta Schedule.**  Meta schedule provides:
+* Succinct syntax, consistent APIs to TensorIR schedule with no other layer of abstraction.
+* Unified APIs for implementing manual schedules, AutoTVM-style schedules, and AutoScheduler-style
+  schedules.
+* Extensibility of all the schedule primitives, including tensorization and loop partitioning.
+  Almost no extra effort is needed to use a new primitive in auto-tuning.
+* The automation infrastructure is extensible on every of its components. Every component of the
+  system can be customized easily in pure python or C++ or both. For example, one can develop a new
+  design space generator in python, a new ProgramRunner in python, etc.
+
+
+## 3. Guide-level explanation
+
+In this section, we describe the syntax of meta schedule DSL, and how it could be used to describe
+and auto-generate the design space.
+
+### 3.1. Manual Schedule
+
+Meta schedule APIs are almost the same as TE or TensorIR scheduling. Here is an example of a manual
+schedule for matrix multiplication:
+
+```python
+# Designate a set of tile sizes
+i_tiles = [16, 8, 8, 8]
+j_tiles = [16, 8, 8, 8]
+k_tiles = [256, 8]
+
+# Tile the loops according to the tile sizes
+i_0, i_1, i_2, i_3 = sch.split(loop=i, factors=i_tiles)
+j_0, j_1, j_2, j_3 = sch.split(loop=j, factors=j_tiles)
+k_0, k_1           = sch.split(loop=k, factors=k_tiles)
+
+# Organize the loops into "SSRSRS" 6-level tiles
+sch.reorder(
+    i_0, j_0, # S: the 1st spatial tile
+    i_1, j_1, # S: the 2nd spatial tile
+    k_0,      # R: the 1st reduction tile
+    i_2, j_2, # S: the 3rd spatial tile
+    k_1,      # R: the 2nd reduction tile
+    i_3, j_3, # S: the 4th spatial tile
+)
+```
+
+In this example, the developers may tweak the tile sizes and measure the performance of the
+generated kernels to explore the opportunities of potential optimization.
+
+Generally speaking, while writing a schedule, there are often some parameters that are hard to
+determine ahead of time, for example, tile sizes, unroll steps, or which tensor intrinsics to use.
+Developers may manually enumerate possible combinations of these unknown factors, and then pick the
+best schedule according to measurement results on their device.
+
+### 3.2. Composite Schedule Rules
+
+As introduced in the previous section, in TensorIR, each schedule primitive handles only a very
+basic transformation of the IR. For example, `split` only splits a loop into two new loops. In the
+real world, the over-fine granularity of those primitives usually leads to repetitive and verbose
+scheduling code, as
+[mentioned](https://discuss.tvm.apache.org/t/rfc-tensorir-a-schedulable-ir-for-tvm/7872/43?u=junrushao1994)
+by developers in our community.
+
+To make it more convenient and modular, we allow users to register "composite schedules" that apply
+a sequence of schedule primitives according to certain analysis of the IR. For instance, a composite
+schedule may inspect a TensorIR block and decide whether we should call `compute_inline` on it.
+
+### 3.3. AutoTVM-style Design Space Description
+
+Meta schedule extends the schedule DSL with sampling instructions. When included in a schedule,
+these instructions parametrize the schedule from a single deterministic point to a space supported
+by random variables (tile size, etc.), making it possible for developers to describe the design
+space with meta schedule APIs.
+
+We can extend the matmul example above to cover all possible tilings using these sampling
+instructions:
+
+```python
+# Sample tile sizes
+i_tiles = sch.sample_perfect_tile(i, n=4)
+j_tiles = sch.sample_perfect_tile(j, n=4)
+k_tiles = sch.sample_perfect_tile(k, n=2)
+# Tile the loops according to the random variables
+i_0, i_1, i_2, i_3 = sch.split(loop=i, factors=i_tiles)
+j_0, j_1, j_2, j_3 = sch.split(loop=j, factors=j_tiles)
+k_0, k_1           = sch.split(loop=k, factors=k_tiles)
+# Organize the loops into "SSRSRS" 6-level tiles
+sch.reorder(
+    i_0, j_0, # S: the 1st spatial tile
+    i_1, j_1, # S: the 2nd spatial tile
+    k_0,      # R: the 1st reduction tile
+    i_2, j_2, # S: the 3rd spatial tile
+    k_1,      # R: the 2nd reduction tile
+    i_3, j_3, # S: the 4th spatial tile
+)
+```
+
+### 3.4. AutoScheduler-style Design Space Generation
+
+AutoScheduler (Ansor) generates schedule templates by applying their SearchRules to each stage.
+SearchRule analyzes TE and eagerly trigger schedule primitives accordingly in its internally

Review comment:
       Revised:
   
   ```markdown
   To generate design space, AutoScheduler (Ansor) applies a set of rules to each TE stage.
   The rules analyze the TE operations and apply an internal DSL to manipulating its internal IR,
   which is in the end mapped to TE schedule primitives. This process is called *sketch generation*.
   
   Composite schedule rules work in a similar way scheduling TensorIR, as introduced in Section 3.2.
   It analyzes the TensorIR and apply schedule primitives directly to TensorIR accordingly.
   When applying such rules to each TensorIR block in certain order (e.g. Post-DFS Order),
   it generates a sequence of schedule primitives.
   If the sampling instructions are present in this sequence, 
   the support of the probability space form a design space of possible schedulings.
   This process is similar to the *sketch generation* phase in AutoScheduler.
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tvm-rfcs] tqchen merged pull request #5: [RFC] Meta Schedule (AutoTIR)

Posted by GitBox <gi...@apache.org>.
tqchen merged pull request #5:
URL: https://github.com/apache/tvm-rfcs/pull/5


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tvm-rfcs] junrushao1994 commented on a change in pull request #5: [RFC] Meta Schedule (AutoTIR)

Posted by GitBox <gi...@apache.org>.
junrushao1994 commented on a change in pull request #5:
URL: https://github.com/apache/tvm-rfcs/pull/5#discussion_r661745758



##########
File path: rfcs/0001-meta-schedule-autotensorir.md
##########
@@ -0,0 +1,297 @@
+* Feature Name: Meta Schedule (AutoTensorIR)
+* Start Date: 2021-05-28
+* RFC PR: TBD (apache/tvm-rfcs#0000)
+* GitHub Issue: TBD (apache/tvm-rfcs#0000)
+
+## 1. Summary
+
+This proposal introduces Meta Schedule: a probabilistic scheduling DSL on TIR that unifies the approaches of AutoTVM and Auto Scheduler (Ansor). Meta schedule provides a pragmatic way to define the space of automatic tuning, extensibility in terms of all possible TIR schedule primitives like tensorization and loop partitioning, and customizability on every layer of the automation system.
+
+Meta Schedule is our 3rd generation automatic scheduling system.
+
+## 2. Motivation
+
+**Scheduling and Design Space**
+
+In TVM TensorIR, optimization of a TensorIR program is done via a sequence of transformations. For example, we reorder loops for better locality and we tensorize for specific hardware intrinsics. The process of invoking such a set of pre-defined transformations is called “**scheduling**”, and each transformation is called a “**schedule primitive**”. These primitives form a domain-specific language (DSL) describing the transformation of TensorIR programs. **Design space** is the set of all possible schedulings with respect to a TensorIR program.
+
+**Problems with the Current Scheduling System**
+
+* **Manual schedule**: Developers optimize their programs by manually invoking schedule primitives, i.e. explore points in the design space with humans in the loop. This can be a tedious and error-prone approach, hence the creation of AutoTVM and AutoScheduler (Ansor).
+* **AutoTVM**: The automation system requires users to define “schedule templates” as the design space for each operator. Therefore, it is inextensible to hundreds of operators.
+* **AutoScheduler (Ansor)**: It automatically generates schedule templates as the design space, according to a set of predefined “search rules”. However, it is non-trivial to extend AutoScheduler to new schedule primitives (tensorize, loop partition, software pipelining).

Review comment:
       added etc here




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tvm-rfcs] FrozenGene commented on a change in pull request #5: [RFC] Meta Schedule (AutoTIR)

Posted by GitBox <gi...@apache.org>.
FrozenGene commented on a change in pull request #5:
URL: https://github.com/apache/tvm-rfcs/pull/5#discussion_r648103811



##########
File path: rfcs/0001-meta-schedule-autotensorir.md
##########
@@ -0,0 +1,297 @@
+* Feature Name: Meta Schedule (AutoTensorIR)
+* Start Date: 2021-05-28
+* RFC PR: TBD (apache/tvm-rfcs#0000)
+* GitHub Issue: TBD (apache/tvm-rfcs#0000)
+
+## 1. Summary
+
+This proposal introduces Meta Schedule: a probabilistic scheduling DSL on TIR that unifies the approaches of AutoTVM and Auto Scheduler (Ansor). Meta schedule provides a pragmatic way to define the space of automatic tuning, extensibility in terms of all possible TIR schedule primitives like tensorization and loop partitioning, and customizability on every layer of the automation system.
+
+Meta Schedule is our 3rd generation automatic scheduling system.
+
+## 2. Motivation
+
+**Scheduling and Design Space**
+
+In TVM TensorIR, optimization of a TensorIR program is done via a sequence of transformations. For example, we reorder loops for better locality and we tensorize for specific hardware intrinsics. The process of invoking such a set of pre-defined transformations is called “**scheduling**”, and each transformation is called a “**schedule primitive**”. These primitives form a domain-specific language (DSL) describing the transformation of TensorIR programs. **Design space** is the set of all possible schedulings with respect to a TensorIR program.
+
+**Problems with the Current Scheduling System**
+
+* **Manual schedule**: Developers optimize their programs by manually invoking schedule primitives, i.e. explore points in the design space with humans in the loop. This can be a tedious and error-prone approach, hence the creation of AutoTVM and AutoScheduler (Ansor).
+* **AutoTVM**: The automation system requires users to define “schedule templates” as the design space for each operator. Therefore, it is inextensible to hundreds of operators.
+* **AutoScheduler (Ansor)**: It automatically generates schedule templates as the design space, according to a set of predefined “search rules”. However, it is non-trivial to extend AutoScheduler to new schedule primitives (tensorize, loop partition, software pipelining).
+* The three systems above have isolated sets of APIs with several layers of their own abstraction, which are not only hard to learn, but also engineering-intensive to customize.
+
+**Benefit of Meta Schedule**
+
+* Succinct syntax, consistent APIs to TensorIR schedule with no other layer of abstraction.
+* Provides unified APIs for implementing manual schedule, AutoTVM and AutoScheduler (Ansor).
+* Extensibility to all the schedule primitives, including tensorization and loop partitioning. Almost no extra effort is needed to use a new primitive in auto-tuning.
+* The automation infrastructure is customizable across every layer.
+
+## 3. Guide-level explanation
+
+In this section, we describe the syntax of meta schedule DSL, and how it could be used to describe and auto-generate the design space.
+
+### 3.1. Manual Schedule
+
+Meta schedule APIs are almost the same as TE or TensorIR scheduling. Here is an example of a manual schedule for matrix multiplication:
+
+```python
+# Designate a set of tile sizes
+i_tiles = [16, 8, 8, 8]
+j_tiles = [16, 8, 8, 8]
+k_tiles = [256, 8]
+
+# Tile the loops according to the tile sizes
+i_0, i_1, i_2, i_3 = sch.split(loop=i, factors=i_tiles)
+j_0, j_1, j_2, j_3 = sch.split(loop=j, factors=j_tiles)
+k_0, k_1           = sch.split(loop=k, factors=k_tiles)
+
+# Organize the loops into “SSRSRS” 6-level tiles
+sch.reorder(
+    i_0, j_0, # S
+    i_1, j_1, # S
+    k_0,      # R
+    i_2, j_2, # S
+    k_1,      # R
+    i_3, j_3, # S
+)
+```
+
+In this example, the developers may tweak the tile sizes and measure the performance of the generated kernels to explore the opportunities of potential optimization.
+
+Generally speaking, while writing a schedule, there are often some parameters that are hard to determine ahead of time, for example, tile sizes, unroll steps, or which tensor intrinsics to use. Developers may manually enumerate possible combinations of these unknown factors, and then pick the best schedule according to measurement results on their device.
+
+### 3.2. AutoTVM-style Design Space Description
+
+Meta schedule extends the schedule DSL with sampling instructions. When included in a schedule, these instructions parametrize the schedule from a single deterministic point to a space supported by random variables (tile size, etc.), making it possible for developers to describe the design space with meta schedule APIs.
+
+We can extend the matmul example above to cover all possible tilings using these sampling instructions:
+
+```python
+# Sample tile sizes
+i_tiles = sch.sample_perfect_tile(i, n=4)
+j_tiles = sch.sample_perfect_tile(j, n=4)
+k_tiles = sch.sample_perfect_tile(k, n=2)
+# Tile the loops according to the random variables
+i_0, i_1, i_2, i_3 = sch.split(loop=i, factors=i_tiles)
+j_0, j_1, j_2, j_3 = sch.split(loop=j, factors=j_tiles)
+k_0, k_1           = sch.split(loop=k, factors=k_tiles)
+# Organize the loops into “SSRSRS” 6-level tiles
+sch.reorder(
+    i_0, j_0, # S
+    i_1, j_1, # S
+    k_0,      # R
+    i_2, j_2, # S
+    k_1,      # R
+    i_3, j_3, # S
+)
+```
+
+### 3.3. Composite Schedule
+
+Each schedule primitive handles only a very basic operation to transform the IR, for example, `split` only splits a loop into two. In the real world, the over-fine granularity of those primitives usually leads to repetitive and verbose scheduling code, as [mentioned](https://discuss.tvm.apache.org/t/rfc-tensorir-a-schedulable-ir-for-tvm/7872/43?u=junrushao1994) by developers in our community.
+
+To counter this challenge, we allow users to register “composite schedules” that analyze the IR, and apply a set of schedule primitives correspondingly. For instance, a composite schedule may inspect a TensorIR block and decide whether we should call `compute_inline` on it. The composite schedule may use sampling instructions to fill in undecided choices.
+
+Our system also ships with some built-in composite schedules, including:
+
+* Multi-level tiling
+* Inline pure spatial blocks
+* Parallelize & vectorize & unroll
+* Auto tensorize
+* …
+
+### 3.4. AutoScheduler-style Design Space Generation
+
+AutoScheduler (Ansor) generates schedule templates by applying their SearchRules to each stage. Meta schedule treats a search rule as a composite schedule, and applies each composite schedule to each block of TensorIR to generate the design space.
+
+### 3.5. Unifying manual schedule / AutoTVM / Ansor
+
+In this section, we show that the design space induced by TE manual schedule, AutoTVM and Ansor are all subsets of meta schedule, and meta schedule further allows mixing those three styles to search jointly.
+
+**Manual schedule**. The TE schedule is a special case of a meta schedule program, where there is no randomness introduced by sampling instructions. It is a single point in terms of design space.
+
+**AutoTVM (Template-based tuning)**. Writing one or more schedule functions in meta schedule, potentially with sampling instructions, is a natural representation of AutoTVM’s schedule templates (knobs). The PPL generates one or more traces as the design space to explore.
+
+**AutoScheduler (Ansor, Template-free tuning)**. As mentioned in the previous section, application of composite schedule rules generates the design space, which is equivalent to Ansor’s sketch generation.
+
+**Mixing styles in design space definition**. By taking union of the spaces induced by the three special cases, our system allows developers to combine generic rules that Ansor provides and operator-specific scheduling.
+
+## 4. Reference-level explanation
+
+In this section, we introduce the underlying techniques for the automation system to extract and explore the design space.
+
+### 4.1. Execution trace as the design space
+
+**Trace**. To represent the design space defined by the meta schedule DSL, the underlying system records all the instructions users applied to the schedule class, including sampling and schedule primitives. We call this list of instructions a trace.
+
+Executing the example above results in the following trace:
+
+```
+Instruction 0. Sample-Perfect-Tile
+Instruction 1. Sample-Perfect-Tile
+Instruction 2. Sample-Perfect-Tile
+Instruction 3. Split
+Instruction 4. Split
+Instruction 5. Split
+Instruction 6. Reorder
+```
+
+The trace is not directly user-facing, but a data structure inside the user-facing `Schedule` class that records the execution. The automation system extracts the trace and finds out the design space according to the sampling instructions.
+
+**Union of traces**. Often a single trace is unable to represent the entire space. Therefore, more precisely, our system works on a list of traces as the union of potential design space.
+
+**Fork a trace**. When two different decisions in the scheduling process are equally important to generate high-performance schedules, we allow forking the trace into two, and the design space is the union of the forked traces.
+
+### 4.2. Exploring the Search Space
+
+Meta Schedule provides several built-in exploration strategies to exhaustively or efficiently search for efficient schedules.
+
+**Program replay**. A simple strategy that replays the schedule program that generates the PPL, and doesn’t use any advantage provided by the PPL.
+
+**Random search**. Extracts the PPL, and repetitively re-executes the PPL by flipping coins purely randomly.
+
+**Cost-model-guided evolutionary search**. A more efficient exploration strategy. We define two sets of rules:
+
+* Mutator: defines how to jump to a point’s “neighbor” in the design space
+* Postprocessor: sometimes it is non-trivial to statically determine the PPL, for example:
+  * There is a hard requirement in CUDA that the maximum number of threads should not exceed 1024, but it is a random variable that cannot be determined before actually executing the PPL. In this case, we write a postprocessor that errors out when the condition is not satisfied.
+  * The number of outer loops to be fused together depends on their extents, which are random variables. In this case, we annotate the maximum extent allowed on the block, and do actual fusion in a postprocessor.
+
+Our evolutionary search algorithm uses mutators to find possible schedules in the design space, then applies postprocessors, and asks the cost model to predict its performance. After several iterations, the new schedules with the highest scores are finally compiled and measured on device. Epsilon-greedy is used in this process to balance exploitation and exploration.
+
+### 4.3. Python first for flexibility & customizability
+
+We implement the system in a way that all levels are decoupled and open to customization, aiming at providing a playground for developers to try out new ideas and potentially deliver performance quickly.
+
+While all the important APIs are implemented in C++ for efficiency, every part of the system can be easily switched to customized python implementation. For example,
+
+**Customize design space in python**. Can be a python function that does the schedule
+
+```python
+def schedule_matmul(sch) -> sch:
+    i, j, k = sch.get_loops(sch.get_block(“matmul”))
+    i_tiles = sch.sample_perfect_tile(i, n=4)
+    j_tiles = sch.sample_perfect_tile(j, n=4)
+    k_tiles = sch.sample_perfect_tile(k, n=2)
+    # Tile the loops according to the random variables
+    i_0, i_1, i_2, i_3 = sch.split(loop=i, factors=i_tiles)
+    j_0, j_1, j_2, j_3 = sch.split(loop=j, factors=j_tiles)
+    k_0, k_1 = sch.split(loop=k, factors=k_tiles)
+    # Organize the loops into “SSRSRS” 6-level tiles
+    sch.reorder(
+        i_0, j_0, # S
+        i_1, j_1, # S
+        k_0,      # R
+        i_2, j_2, # S
+        k_1,      # R
+        i_3, j_3, # S
+    )
+    return sch
+```
+
+**Customize composite schedule in python**. We provide two ways to define a composite schedule in python:
+
+Method 1. A simple decorator that converts a python function to a composite schedule
+
+```python
+@tir.as_composite_schedule(name="multi-level-tiling")
+def multi_level_tiling(sch: Schedule, block: BlockRV) -> Union[Schedule, List[Schedule]]:
+    ...

Review comment:
       I think we could provide a bit more code here how to leverage `sch` and `block` here to do `multi level tiling` like previous `manual schedule` / `AutoTVM`

##########
File path: rfcs/0001-meta-schedule-autotensorir.md
##########
@@ -0,0 +1,297 @@
+* Feature Name: Meta Schedule (AutoTensorIR)
+* Start Date: 2021-05-28
+* RFC PR: TBD (apache/tvm-rfcs#0000)
+* GitHub Issue: TBD (apache/tvm-rfcs#0000)
+
+## 1. Summary
+
+This proposal introduces Meta Schedule: a probabilistic scheduling DSL on TIR that unifies the approaches of AutoTVM and Auto Scheduler (Ansor). Meta schedule provides a pragmatic way to define the space of automatic tuning, extensibility in terms of all possible TIR schedule primitives like tensorization and loop partitioning, and customizability on every layer of the automation system.
+
+Meta Schedule is our 3rd generation automatic scheduling system.
+
+## 2. Motivation
+
+**Scheduling and Design Space**
+
+In TVM TensorIR, optimization of a TensorIR program is done via a sequence of transformations. For example, we reorder loops for better locality and we tensorize for specific hardware intrinsics. The process of invoking such a set of pre-defined transformations is called “**scheduling**”, and each transformation is called a “**schedule primitive**”. These primitives form a domain-specific language (DSL) describing the transformation of TensorIR programs. **Design space** is the set of all possible schedulings with respect to a TensorIR program.
+
+**Problems with the Current Scheduling System**
+
+* **Manual schedule**: Developers optimize their programs by manually invoking schedule primitives, i.e. explore points in the design space with humans in the loop. This can be a tedious and error-prone approach, hence the creation of AutoTVM and AutoScheduler (Ansor).
+* **AutoTVM**: The automation system requires users to define “schedule templates” as the design space for each operator. Therefore, it is inextensible to hundreds of operators.
+* **AutoScheduler (Ansor)**: It automatically generates schedule templates as the design space, according to a set of predefined “search rules”. However, it is non-trivial to extend AutoScheduler to new schedule primitives (tensorize, loop partition, software pipelining).
+* The three systems above have isolated sets of APIs with several layers of their own abstraction, which are not only hard to learn, but also engineering-intensive to customize.
+
+**Benefit of Meta Schedule**
+
+* Succinct syntax, consistent APIs to TensorIR schedule with no other layer of abstraction.
+* Provides unified APIs for implementing manual schedule, AutoTVM and AutoScheduler (Ansor).
+* Extensibility to all the schedule primitives, including tensorization and loop partitioning. Almost no extra effort is needed to use a new primitive in auto-tuning.
+* The automation infrastructure is customizable across every layer.
+
+## 3. Guide-level explanation
+
+In this section, we describe the syntax of meta schedule DSL, and how it could be used to describe and auto-generate the design space.
+
+### 3.1. Manual Schedule
+
+Meta schedule APIs are almost the same as TE or TensorIR scheduling. Here is an example of a manual schedule for matrix multiplication:
+
+```python
+# Designate a set of tile sizes
+i_tiles = [16, 8, 8, 8]
+j_tiles = [16, 8, 8, 8]
+k_tiles = [256, 8]
+
+# Tile the loops according to the tile sizes
+i_0, i_1, i_2, i_3 = sch.split(loop=i, factors=i_tiles)
+j_0, j_1, j_2, j_3 = sch.split(loop=j, factors=j_tiles)
+k_0, k_1           = sch.split(loop=k, factors=k_tiles)
+
+# Organize the loops into “SSRSRS” 6-level tiles
+sch.reorder(
+    i_0, j_0, # S
+    i_1, j_1, # S
+    k_0,      # R
+    i_2, j_2, # S
+    k_1,      # R
+    i_3, j_3, # S
+)
+```
+
+In this example, the developers may tweak the tile sizes and measure the performance of the generated kernels to explore the opportunities of potential optimization.
+
+Generally speaking, while writing a schedule, there are often some parameters that are hard to determine ahead of time, for example, tile sizes, unroll steps, or which tensor intrinsics to use. Developers may manually enumerate possible combinations of these unknown factors, and then pick the best schedule according to measurement results on their device.
+
+### 3.2. AutoTVM-style Design Space Description
+
+Meta schedule extends the schedule DSL with sampling instructions. When included in a schedule, these instructions parametrize the schedule from a single deterministic point to a space supported by random variables (tile size, etc.), making it possible for developers to describe the design space with meta schedule APIs.
+
+We can extend the matmul example above to cover all possible tilings using these sampling instructions:
+
+```python
+# Sample tile sizes
+i_tiles = sch.sample_perfect_tile(i, n=4)
+j_tiles = sch.sample_perfect_tile(j, n=4)
+k_tiles = sch.sample_perfect_tile(k, n=2)
+# Tile the loops according to the random variables
+i_0, i_1, i_2, i_3 = sch.split(loop=i, factors=i_tiles)
+j_0, j_1, j_2, j_3 = sch.split(loop=j, factors=j_tiles)
+k_0, k_1           = sch.split(loop=k, factors=k_tiles)
+# Organize the loops into “SSRSRS” 6-level tiles
+sch.reorder(
+    i_0, j_0, # S
+    i_1, j_1, # S
+    k_0,      # R
+    i_2, j_2, # S
+    k_1,      # R
+    i_3, j_3, # S
+)
+```
+
+### 3.3. Composite Schedule
+
+Each schedule primitive handles only a very basic operation to transform the IR, for example, `split` only splits a loop into two. In the real world, the over-fine granularity of those primitives usually leads to repetitive and verbose scheduling code, as [mentioned](https://discuss.tvm.apache.org/t/rfc-tensorir-a-schedulable-ir-for-tvm/7872/43?u=junrushao1994) by developers in our community.
+
+To counter this challenge, we allow users to register “composite schedules” that analyze the IR, and apply a set of schedule primitives correspondingly. For instance, a composite schedule may inspect a TensorIR block and decide whether we should call `compute_inline` on it. The composite schedule may use sampling instructions to fill in undecided choices.
+
+Our system also ships with some built-in composite schedules, including:
+
+* Multi-level tiling
+* Inline pure spatial blocks
+* Parallelize & vectorize & unroll
+* Auto tensorize
+* …
+
+### 3.4. AutoScheduler-style Design Space Generation
+
+AutoScheduler (Ansor) generates schedule templates by applying their SearchRules to each stage. Meta schedule treats a search rule as a composite schedule, and applies each composite schedule to each block of TensorIR to generate the design space.
+
+### 3.5. Unifying manual schedule / AutoTVM / Ansor
+
+In this section, we show that the design space induced by TE manual schedule, AutoTVM and Ansor are all subsets of meta schedule, and meta schedule further allows mixing those three styles to search jointly.
+
+**Manual schedule**. The TE schedule is a special case of a meta schedule program, where there is no randomness introduced by sampling instructions. It is a single point in terms of design space.
+
+**AutoTVM (Template-based tuning)**. Writing one or more schedule functions in meta schedule, potentially with sampling instructions, is a natural representation of AutoTVM’s schedule templates (knobs). The PPL generates one or more traces as the design space to explore.
+
+**AutoScheduler (Ansor, Template-free tuning)**. As mentioned in the previous section, application of composite schedule rules generates the design space, which is equivalent to Ansor’s sketch generation.
+
+**Mixing styles in design space definition**. By taking union of the spaces induced by the three special cases, our system allows developers to combine generic rules that Ansor provides and operator-specific scheduling.
+
+## 4. Reference-level explanation
+
+In this section, we introduce the underlying techniques for the automation system to extract and explore the design space.
+
+### 4.1. Execution trace as the design space
+
+**Trace**. To represent the design space defined by the meta schedule DSL, the underlying system records all the instructions users applied to the schedule class, including sampling and schedule primitives. We call this list of instructions a trace.
+
+Executing the example above results in the following trace:
+
+```
+Instruction 0. Sample-Perfect-Tile
+Instruction 1. Sample-Perfect-Tile
+Instruction 2. Sample-Perfect-Tile
+Instruction 3. Split
+Instruction 4. Split
+Instruction 5. Split
+Instruction 6. Reorder
+```
+
+The trace is not directly user-facing, but a data structure inside the user-facing `Schedule` class that records the execution. The automation system extracts the trace and finds out the design space according to the sampling instructions.
+
+**Union of traces**. Often a single trace is unable to represent the entire space. Therefore, more precisely, our system works on a list of traces as the union of potential design space.
+
+**Fork a trace**. When two different decisions in the scheduling process are equally important to generate high-performance schedules, we allow forking the trace into two, and the design space is the union of the forked traces.
+
+### 4.2. Exploring the Search Space
+
+Meta Schedule provides several built-in exploration strategies to exhaustively or efficiently search for efficient schedules.
+
+**Program replay**. A simple strategy that replays the schedule program that generates the PPL, and doesn’t use any advantage provided by the PPL.
+
+**Random search**. Extracts the PPL, and repetitively re-executes the PPL by flipping coins purely randomly.
+
+**Cost-model-guided evolutionary search**. A more efficient exploration strategy. We define two sets of rules:
+
+* Mutator: defines how to jump to a point’s “neighbor” in the design space
+* Postprocessor: sometimes it is non-trivial to statically determine the PPL, for example:
+  * There is a hard requirement in CUDA that the maximum number of threads should not exceed 1024, but it is a random variable that cannot be determined before actually executing the PPL. In this case, we write a postprocessor that errors out when the condition is not satisfied.
+  * The number of outer loops to be fused together depends on their extents, which are random variables. In this case, we annotate the maximum extent allowed on the block, and do actual fusion in a postprocessor.
+
+Our evolutionary search algorithm uses mutators to find possible schedules in the design space, then applies postprocessors, and asks the cost model to predict its performance. After several iterations, the new schedules with the highest scores are finally compiled and measured on device. Epsilon-greedy is used in this process to balance exploitation and exploration.
+
+### 4.3. Python first for flexibility & customizability
+
+We implement the system in a way that all levels are decoupled and open to customization, aiming at providing a playground for developers to try out new ideas and potentially deliver performance quickly.
+
+While all the important APIs are implemented in C++ for efficiency, every part of the system can be easily switched to customized python implementation. For example,
+
+**Customize design space in python**. Can be a python function that does the schedule
+
+```python
+def schedule_matmul(sch) -> sch:
+    i, j, k = sch.get_loops(sch.get_block(“matmul”))
+    i_tiles = sch.sample_perfect_tile(i, n=4)
+    j_tiles = sch.sample_perfect_tile(j, n=4)
+    k_tiles = sch.sample_perfect_tile(k, n=2)
+    # Tile the loops according to the random variables
+    i_0, i_1, i_2, i_3 = sch.split(loop=i, factors=i_tiles)
+    j_0, j_1, j_2, j_3 = sch.split(loop=j, factors=j_tiles)
+    k_0, k_1 = sch.split(loop=k, factors=k_tiles)
+    # Organize the loops into “SSRSRS” 6-level tiles
+    sch.reorder(
+        i_0, j_0, # S
+        i_1, j_1, # S
+        k_0,      # R
+        i_2, j_2, # S
+        k_1,      # R
+        i_3, j_3, # S
+    )
+    return sch
+```
+
+**Customize composite schedule in python**. We provide two ways to define a composite schedule in python:
+
+Method 1. A simple decorator that converts a python function to a composite schedule
+
+```python
+@tir.as_composite_schedule(name="multi-level-tiling")
+def multi_level_tiling(sch: Schedule, block: BlockRV) -> Union[Schedule, List[Schedule]]:
+    ...
+```
+
+Method 2. Derive from `PyCompositeSchedule`, providing extra functionalities like initialization
+
+```python
+class MultiLevelTiling(PyCompositeSchedule):
+    def initialize(...):
+        ...
+    def apply(...):
+        ...

Review comment:
       What is the purpose and advantages compared with 1st decorator method? I think we should list it so that developers could know which way should choose according to different conditions.

##########
File path: rfcs/0001-meta-schedule-autotensorir.md
##########
@@ -0,0 +1,297 @@
+* Feature Name: Meta Schedule (AutoTensorIR)
+* Start Date: 2021-05-28
+* RFC PR: TBD (apache/tvm-rfcs#0000)
+* GitHub Issue: TBD (apache/tvm-rfcs#0000)
+
+## 1. Summary
+
+This proposal introduces Meta Schedule: a probabilistic scheduling DSL on TIR that unifies the approaches of AutoTVM and Auto Scheduler (Ansor). Meta schedule provides a pragmatic way to define the space of automatic tuning, extensibility in terms of all possible TIR schedule primitives like tensorization and loop partitioning, and customizability on every layer of the automation system.
+
+Meta Schedule is our 3rd generation automatic scheduling system.
+
+## 2. Motivation
+
+**Scheduling and Design Space**
+
+In TVM TensorIR, optimization of a TensorIR program is done via a sequence of transformations. For example, we reorder loops for better locality and we tensorize for specific hardware intrinsics. The process of invoking such a set of pre-defined transformations is called “**scheduling**”, and each transformation is called a “**schedule primitive**”. These primitives form a domain-specific language (DSL) describing the transformation of TensorIR programs. **Design space** is the set of all possible schedulings with respect to a TensorIR program.
+
+**Problems with the Current Scheduling System**
+
+* **Manual schedule**: Developers optimize their programs by manually invoking schedule primitives, i.e. explore points in the design space with humans in the loop. This can be a tedious and error-prone approach, hence the creation of AutoTVM and AutoScheduler (Ansor).
+* **AutoTVM**: The automation system requires users to define “schedule templates” as the design space for each operator. Therefore, it is inextensible to hundreds of operators.
+* **AutoScheduler (Ansor)**: It automatically generates schedule templates as the design space, according to a set of predefined “search rules”. However, it is non-trivial to extend AutoScheduler to new schedule primitives (tensorize, loop partition, software pipelining).
+* The three systems above have isolated sets of APIs with several layers of their own abstraction, which are not only hard to learn, but also engineering-intensive to customize.
+
+**Benefit of Meta Schedule**
+
+* Succinct syntax, consistent APIs to TensorIR schedule with no other layer of abstraction.
+* Provides unified APIs for implementing manual schedule, AutoTVM and AutoScheduler (Ansor).
+* Extensibility to all the schedule primitives, including tensorization and loop partitioning. Almost no extra effort is needed to use a new primitive in auto-tuning.
+* The automation infrastructure is customizable across every layer.
+
+## 3. Guide-level explanation
+
+In this section, we describe the syntax of meta schedule DSL, and how it could be used to describe and auto-generate the design space.
+
+### 3.1. Manual Schedule
+
+Meta schedule APIs are almost the same as TE or TensorIR scheduling. Here is an example of a manual schedule for matrix multiplication:
+
+```python
+# Designate a set of tile sizes
+i_tiles = [16, 8, 8, 8]
+j_tiles = [16, 8, 8, 8]
+k_tiles = [256, 8]
+
+# Tile the loops according to the tile sizes
+i_0, i_1, i_2, i_3 = sch.split(loop=i, factors=i_tiles)
+j_0, j_1, j_2, j_3 = sch.split(loop=j, factors=j_tiles)
+k_0, k_1           = sch.split(loop=k, factors=k_tiles)
+
+# Organize the loops into “SSRSRS” 6-level tiles
+sch.reorder(
+    i_0, j_0, # S
+    i_1, j_1, # S
+    k_0,      # R
+    i_2, j_2, # S
+    k_1,      # R
+    i_3, j_3, # S
+)
+```
+
+In this example, the developers may tweak the tile sizes and measure the performance of the generated kernels to explore the opportunities of potential optimization.
+
+Generally speaking, while writing a schedule, there are often some parameters that are hard to determine ahead of time, for example, tile sizes, unroll steps, or which tensor intrinsics to use. Developers may manually enumerate possible combinations of these unknown factors, and then pick the best schedule according to measurement results on their device.
+
+### 3.2. AutoTVM-style Design Space Description
+
+Meta schedule extends the schedule DSL with sampling instructions. When included in a schedule, these instructions parametrize the schedule from a single deterministic point to a space supported by random variables (tile size, etc.), making it possible for developers to describe the design space with meta schedule APIs.
+
+We can extend the matmul example above to cover all possible tilings using these sampling instructions:
+
+```python
+# Sample tile sizes
+i_tiles = sch.sample_perfect_tile(i, n=4)
+j_tiles = sch.sample_perfect_tile(j, n=4)
+k_tiles = sch.sample_perfect_tile(k, n=2)
+# Tile the loops according to the random variables
+i_0, i_1, i_2, i_3 = sch.split(loop=i, factors=i_tiles)
+j_0, j_1, j_2, j_3 = sch.split(loop=j, factors=j_tiles)
+k_0, k_1           = sch.split(loop=k, factors=k_tiles)
+# Organize the loops into “SSRSRS” 6-level tiles
+sch.reorder(
+    i_0, j_0, # S
+    i_1, j_1, # S
+    k_0,      # R
+    i_2, j_2, # S
+    k_1,      # R
+    i_3, j_3, # S
+)
+```
+
+### 3.3. Composite Schedule
+
+Each schedule primitive handles only a very basic operation to transform the IR, for example, `split` only splits a loop into two. In the real world, the over-fine granularity of those primitives usually leads to repetitive and verbose scheduling code, as [mentioned](https://discuss.tvm.apache.org/t/rfc-tensorir-a-schedulable-ir-for-tvm/7872/43?u=junrushao1994) by developers in our community.
+
+To counter this challenge, we allow users to register “composite schedules” that analyze the IR, and apply a set of schedule primitives correspondingly. For instance, a composite schedule may inspect a TensorIR block and decide whether we should call `compute_inline` on it. The composite schedule may use sampling instructions to fill in undecided choices.
+
+Our system also ships with some built-in composite schedules, including:
+
+* Multi-level tiling
+* Inline pure spatial blocks
+* Parallelize & vectorize & unroll
+* Auto tensorize
+* …
+
+### 3.4. AutoScheduler-style Design Space Generation
+
+AutoScheduler (Ansor) generates schedule templates by applying their SearchRules to each stage. Meta schedule treats a search rule as a composite schedule, and applies each composite schedule to each block of TensorIR to generate the design space.
+
+### 3.5. Unifying manual schedule / AutoTVM / Ansor
+
+In this section, we show that the design space induced by TE manual schedule, AutoTVM and Ansor are all subsets of meta schedule, and meta schedule further allows mixing those three styles to search jointly.
+
+**Manual schedule**. The TE schedule is a special case of a meta schedule program, where there is no randomness introduced by sampling instructions. It is a single point in terms of design space.
+
+**AutoTVM (Template-based tuning)**. Writing one or more schedule functions in meta schedule, potentially with sampling instructions, is a natural representation of AutoTVM’s schedule templates (knobs). The PPL generates one or more traces as the design space to explore.
+
+**AutoScheduler (Ansor, Template-free tuning)**. As mentioned in the previous section, application of composite schedule rules generates the design space, which is equivalent to Ansor’s sketch generation.
+
+**Mixing styles in design space definition**. By taking union of the spaces induced by the three special cases, our system allows developers to combine generic rules that Ansor provides and operator-specific scheduling.
+
+## 4. Reference-level explanation
+
+In this section, we introduce the underlying techniques for the automation system to extract and explore the design space.
+
+### 4.1. Execution trace as the design space
+
+**Trace**. To represent the design space defined by the meta schedule DSL, the underlying system records all the instructions users applied to the schedule class, including sampling and schedule primitives. We call this list of instructions a trace.
+
+Executing the example above results in the following trace:
+
+```
+Instruction 0. Sample-Perfect-Tile
+Instruction 1. Sample-Perfect-Tile
+Instruction 2. Sample-Perfect-Tile
+Instruction 3. Split
+Instruction 4. Split
+Instruction 5. Split
+Instruction 6. Reorder
+```
+
+The trace is not directly user-facing, but a data structure inside the user-facing `Schedule` class that records the execution. The automation system extracts the trace and finds out the design space according to the sampling instructions.
+
+**Union of traces**. Often a single trace is unable to represent the entire space. Therefore, more precisely, our system works on a list of traces as the union of potential design space.
+
+**Fork a trace**. When two different decisions in the scheduling process are equally important to generate high-performance schedules, we allow forking the trace into two, and the design space is the union of the forked traces.
+
+### 4.2. Exploring the Search Space
+
+Meta Schedule provides several built-in exploration strategies to exhaustively or efficiently search for efficient schedules.
+
+**Program replay**. A simple strategy that replays the schedule program that generates the PPL, and doesn’t use any advantage provided by the PPL.
+
+**Random search**. Extracts the PPL, and repetitively re-executes the PPL by flipping coins purely randomly.
+
+**Cost-model-guided evolutionary search**. A more efficient exploration strategy. We define two sets of rules:
+
+* Mutator: defines how to jump to a point’s “neighbor” in the design space
+* Postprocessor: sometimes it is non-trivial to statically determine the PPL, for example:
+  * There is a hard requirement in CUDA that the maximum number of threads should not exceed 1024, but it is a random variable that cannot be determined before actually executing the PPL. In this case, we write a postprocessor that errors out when the condition is not satisfied.
+  * The number of outer loops to be fused together depends on their extents, which are random variables. In this case, we annotate the maximum extent allowed on the block, and do actual fusion in a postprocessor.
+
+Our evolutionary search algorithm uses mutators to find possible schedules in the design space, then applies postprocessors, and asks the cost model to predict its performance. After several iterations, the new schedules with the highest scores are finally compiled and measured on device. Epsilon-greedy is used in this process to balance exploitation and exploration.
+
+### 4.3. Python first for flexibility & customizability
+
+We implement the system in a way that all levels are decoupled and open to customization, aiming at providing a playground for developers to try out new ideas and potentially deliver performance quickly.
+
+While all the important APIs are implemented in C++ for efficiency, every part of the system can be easily switched to customized python implementation. For example,
+
+**Customize design space in python**. Can be a python function that does the schedule
+
+```python
+def schedule_matmul(sch) -> sch:
+    i, j, k = sch.get_loops(sch.get_block(“matmul”))
+    i_tiles = sch.sample_perfect_tile(i, n=4)
+    j_tiles = sch.sample_perfect_tile(j, n=4)
+    k_tiles = sch.sample_perfect_tile(k, n=2)
+    # Tile the loops according to the random variables
+    i_0, i_1, i_2, i_3 = sch.split(loop=i, factors=i_tiles)
+    j_0, j_1, j_2, j_3 = sch.split(loop=j, factors=j_tiles)
+    k_0, k_1 = sch.split(loop=k, factors=k_tiles)
+    # Organize the loops into “SSRSRS” 6-level tiles
+    sch.reorder(
+        i_0, j_0, # S
+        i_1, j_1, # S
+        k_0,      # R
+        i_2, j_2, # S
+        k_1,      # R
+        i_3, j_3, # S
+    )
+    return sch
+```
+
+**Customize composite schedule in python**. We provide two ways to define a composite schedule in python:
+
+Method 1. A simple decorator that converts a python function to a composite schedule
+
+```python
+@tir.as_composite_schedule(name="multi-level-tiling")
+def multi_level_tiling(sch: Schedule, block: BlockRV) -> Union[Schedule, List[Schedule]]:
+    ...
+```
+
+Method 2. Derive from `PyCompositeSchedule`, providing extra functionalities like initialization
+
+```python
+class MultiLevelTiling(PyCompositeSchedule):
+    def initialize(...):
+        ...
+    def apply(...):
+        ...
+```
+
+**Customize exploration strategies in python**. Developers can implement any search algorithm in python as well by deriving from `PySearchPolicy`.
+
+**Other customizable components**. This list includes:
+
+* Cost model
+* Database
+* Measure callbacks
+* Feature extractor
+* Program builder & runner
+* Analysis methods
+* ...
+
+In a short summary, almost every component of the system is decoupled with each other and extensions could be easily plugged in.
+
+### 4.4. Upstreaming Plan
+
+[M3a] Core infrastructure of the PPL

Review comment:
       What is the meaning of `M3a` / `M3b` ...?

##########
File path: rfcs/0001-meta-schedule-autotensorir.md
##########
@@ -0,0 +1,297 @@
+* Feature Name: Meta Schedule (AutoTensorIR)
+* Start Date: 2021-05-28
+* RFC PR: TBD (apache/tvm-rfcs#0000)
+* GitHub Issue: TBD (apache/tvm-rfcs#0000)
+
+## 1. Summary
+
+This proposal introduces Meta Schedule: a probabilistic scheduling DSL on TIR that unifies the approaches of AutoTVM and Auto Scheduler (Ansor). Meta schedule provides a pragmatic way to define the space of automatic tuning, extensibility in terms of all possible TIR schedule primitives like tensorization and loop partitioning, and customizability on every layer of the automation system.
+
+Meta Schedule is our 3rd generation automatic scheduling system.
+
+## 2. Motivation
+
+**Scheduling and Design Space**
+
+In TVM TensorIR, optimization of a TensorIR program is done via a sequence of transformations. For example, we reorder loops for better locality and we tensorize for specific hardware intrinsics. The process of invoking such a set of pre-defined transformations is called “**scheduling**”, and each transformation is called a “**schedule primitive**”. These primitives form a domain-specific language (DSL) describing the transformation of TensorIR programs. **Design space** is the set of all possible schedulings with respect to a TensorIR program.
+
+**Problems with the Current Scheduling System**
+
+* **Manual schedule**: Developers optimize their programs by manually invoking schedule primitives, i.e. explore points in the design space with humans in the loop. This can be a tedious and error-prone approach, hence the creation of AutoTVM and AutoScheduler (Ansor).
+* **AutoTVM**: The automation system requires users to define “schedule templates” as the design space for each operator. Therefore, it is inextensible to hundreds of operators.
+* **AutoScheduler (Ansor)**: It automatically generates schedule templates as the design space, according to a set of predefined “search rules”. However, it is non-trivial to extend AutoScheduler to new schedule primitives (tensorize, loop partition, software pipelining).
+* The three systems above have isolated sets of APIs with several layers of their own abstraction, which are not only hard to learn, but also engineering-intensive to customize.
+
+**Benefit of Meta Schedule**
+
+* Succinct syntax, consistent APIs to TensorIR schedule with no other layer of abstraction.
+* Provides unified APIs for implementing manual schedule, AutoTVM and AutoScheduler (Ansor).
+* Extensibility to all the schedule primitives, including tensorization and loop partitioning. Almost no extra effort is needed to use a new primitive in auto-tuning.
+* The automation infrastructure is customizable across every layer.
+
+## 3. Guide-level explanation
+
+In this section, we describe the syntax of meta schedule DSL, and how it could be used to describe and auto-generate the design space.
+
+### 3.1. Manual Schedule
+
+Meta schedule APIs are almost the same as TE or TensorIR scheduling. Here is an example of a manual schedule for matrix multiplication:
+
+```python
+# Designate a set of tile sizes
+i_tiles = [16, 8, 8, 8]
+j_tiles = [16, 8, 8, 8]
+k_tiles = [256, 8]
+
+# Tile the loops according to the tile sizes
+i_0, i_1, i_2, i_3 = sch.split(loop=i, factors=i_tiles)
+j_0, j_1, j_2, j_3 = sch.split(loop=j, factors=j_tiles)
+k_0, k_1           = sch.split(loop=k, factors=k_tiles)
+
+# Organize the loops into “SSRSRS” 6-level tiles
+sch.reorder(
+    i_0, j_0, # S
+    i_1, j_1, # S
+    k_0,      # R
+    i_2, j_2, # S
+    k_1,      # R
+    i_3, j_3, # S
+)
+```
+
+In this example, the developers may tweak the tile sizes and measure the performance of the generated kernels to explore the opportunities of potential optimization.
+
+Generally speaking, while writing a schedule, there are often some parameters that are hard to determine ahead of time, for example, tile sizes, unroll steps, or which tensor intrinsics to use. Developers may manually enumerate possible combinations of these unknown factors, and then pick the best schedule according to measurement results on their device.
+
+### 3.2. AutoTVM-style Design Space Description
+
+Meta schedule extends the schedule DSL with sampling instructions. When included in a schedule, these instructions parametrize the schedule from a single deterministic point to a space supported by random variables (tile size, etc.), making it possible for developers to describe the design space with meta schedule APIs.
+
+We can extend the matmul example above to cover all possible tilings using these sampling instructions:
+
+```python
+# Sample tile sizes
+i_tiles = sch.sample_perfect_tile(i, n=4)
+j_tiles = sch.sample_perfect_tile(j, n=4)
+k_tiles = sch.sample_perfect_tile(k, n=2)
+# Tile the loops according to the random variables
+i_0, i_1, i_2, i_3 = sch.split(loop=i, factors=i_tiles)
+j_0, j_1, j_2, j_3 = sch.split(loop=j, factors=j_tiles)
+k_0, k_1           = sch.split(loop=k, factors=k_tiles)
+# Organize the loops into “SSRSRS” 6-level tiles
+sch.reorder(
+    i_0, j_0, # S
+    i_1, j_1, # S
+    k_0,      # R
+    i_2, j_2, # S
+    k_1,      # R
+    i_3, j_3, # S
+)
+```
+
+### 3.3. Composite Schedule
+
+Each schedule primitive handles only a very basic operation to transform the IR, for example, `split` only splits a loop into two. In the real world, the over-fine granularity of those primitives usually leads to repetitive and verbose scheduling code, as [mentioned](https://discuss.tvm.apache.org/t/rfc-tensorir-a-schedulable-ir-for-tvm/7872/43?u=junrushao1994) by developers in our community.
+
+To counter this challenge, we allow users to register “composite schedules” that analyze the IR, and apply a set of schedule primitives correspondingly. For instance, a composite schedule may inspect a TensorIR block and decide whether we should call `compute_inline` on it. The composite schedule may use sampling instructions to fill in undecided choices.
+
+Our system also ships with some built-in composite schedules, including:
+
+* Multi-level tiling
+* Inline pure spatial blocks
+* Parallelize & vectorize & unroll
+* Auto tensorize
+* …
+
+### 3.4. AutoScheduler-style Design Space Generation
+
+AutoScheduler (Ansor) generates schedule templates by applying their SearchRules to each stage. Meta schedule treats a search rule as a composite schedule, and applies each composite schedule to each block of TensorIR to generate the design space.
+
+### 3.5. Unifying manual schedule / AutoTVM / Ansor
+
+In this section, we show that the design space induced by TE manual schedule, AutoTVM and Ansor are all subsets of meta schedule, and meta schedule further allows mixing those three styles to search jointly.
+
+**Manual schedule**. The TE schedule is a special case of a meta schedule program, where there is no randomness introduced by sampling instructions. It is a single point in terms of design space.
+
+**AutoTVM (Template-based tuning)**. Writing one or more schedule functions in meta schedule, potentially with sampling instructions, is a natural representation of AutoTVM’s schedule templates (knobs). The PPL generates one or more traces as the design space to explore.
+
+**AutoScheduler (Ansor, Template-free tuning)**. As mentioned in the previous section, application of composite schedule rules generates the design space, which is equivalent to Ansor’s sketch generation.
+
+**Mixing styles in design space definition**. By taking union of the spaces induced by the three special cases, our system allows developers to combine generic rules that Ansor provides and operator-specific scheduling.
+
+## 4. Reference-level explanation
+
+In this section, we introduce the underlying techniques for the automation system to extract and explore the design space.
+
+### 4.1. Execution trace as the design space
+
+**Trace**. To represent the design space defined by the meta schedule DSL, the underlying system records all the instructions users applied to the schedule class, including sampling and schedule primitives. We call this list of instructions a trace.
+
+Executing the example above results in the following trace:
+
+```
+Instruction 0. Sample-Perfect-Tile
+Instruction 1. Sample-Perfect-Tile
+Instruction 2. Sample-Perfect-Tile
+Instruction 3. Split
+Instruction 4. Split
+Instruction 5. Split
+Instruction 6. Reorder
+```
+
+The trace is not directly user-facing, but a data structure inside the user-facing `Schedule` class that records the execution. The automation system extracts the trace and finds out the design space according to the sampling instructions.
+
+**Union of traces**. Often a single trace is unable to represent the entire space. Therefore, more precisely, our system works on a list of traces as the union of potential design space.
+
+**Fork a trace**. When two different decisions in the scheduling process are equally important to generate high-performance schedules, we allow forking the trace into two, and the design space is the union of the forked traces.
+
+### 4.2. Exploring the Search Space
+
+Meta Schedule provides several built-in exploration strategies to exhaustively or efficiently search for efficient schedules.
+
+**Program replay**. A simple strategy that replays the schedule program that generates the PPL, and doesn’t use any advantage provided by the PPL.
+
+**Random search**. Extracts the PPL, and repetitively re-executes the PPL by flipping coins purely randomly.
+
+**Cost-model-guided evolutionary search**. A more efficient exploration strategy. We define two sets of rules:
+
+* Mutator: defines how to jump to a point’s “neighbor” in the design space
+* Postprocessor: sometimes it is non-trivial to statically determine the PPL, for example:
+  * There is a hard requirement in CUDA that the maximum number of threads should not exceed 1024, but it is a random variable that cannot be determined before actually executing the PPL. In this case, we write a postprocessor that errors out when the condition is not satisfied.
+  * The number of outer loops to be fused together depends on their extents, which are random variables. In this case, we annotate the maximum extent allowed on the block, and do actual fusion in a postprocessor.
+
+Our evolutionary search algorithm uses mutators to find possible schedules in the design space, then applies postprocessors, and asks the cost model to predict its performance. After several iterations, the new schedules with the highest scores are finally compiled and measured on device. Epsilon-greedy is used in this process to balance exploitation and exploration.
+
+### 4.3. Python first for flexibility & customizability
+
+We implement the system in a way that all levels are decoupled and open to customization, aiming at providing a playground for developers to try out new ideas and potentially deliver performance quickly.
+
+While all the important APIs are implemented in C++ for efficiency, every part of the system can be easily switched to customized python implementation. For example,
+
+**Customize design space in python**. Can be a python function that does the schedule
+
+```python
+def schedule_matmul(sch) -> sch:
+    i, j, k = sch.get_loops(sch.get_block(“matmul”))
+    i_tiles = sch.sample_perfect_tile(i, n=4)
+    j_tiles = sch.sample_perfect_tile(j, n=4)
+    k_tiles = sch.sample_perfect_tile(k, n=2)
+    # Tile the loops according to the random variables
+    i_0, i_1, i_2, i_3 = sch.split(loop=i, factors=i_tiles)
+    j_0, j_1, j_2, j_3 = sch.split(loop=j, factors=j_tiles)
+    k_0, k_1 = sch.split(loop=k, factors=k_tiles)
+    # Organize the loops into “SSRSRS” 6-level tiles
+    sch.reorder(
+        i_0, j_0, # S
+        i_1, j_1, # S
+        k_0,      # R
+        i_2, j_2, # S
+        k_1,      # R
+        i_3, j_3, # S
+    )
+    return sch
+```
+
+**Customize composite schedule in python**. We provide two ways to define a composite schedule in python:
+
+Method 1. A simple decorator that converts a python function to a composite schedule
+
+```python
+@tir.as_composite_schedule(name="multi-level-tiling")
+def multi_level_tiling(sch: Schedule, block: BlockRV) -> Union[Schedule, List[Schedule]]:
+    ...
+```
+
+Method 2. Derive from `PyCompositeSchedule`, providing extra functionalities like initialization
+
+```python
+class MultiLevelTiling(PyCompositeSchedule):
+    def initialize(...):
+        ...
+    def apply(...):
+        ...
+```
+
+**Customize exploration strategies in python**. Developers can implement any search algorithm in python as well by deriving from `PySearchPolicy`.
+
+**Other customizable components**. This list includes:
+
+* Cost model
+* Database

Review comment:
       What is the purpose of Database? 

##########
File path: rfcs/0001-meta-schedule-autotensorir.md
##########
@@ -0,0 +1,297 @@
+* Feature Name: Meta Schedule (AutoTensorIR)
+* Start Date: 2021-05-28
+* RFC PR: TBD (apache/tvm-rfcs#0000)
+* GitHub Issue: TBD (apache/tvm-rfcs#0000)
+
+## 1. Summary
+
+This proposal introduces Meta Schedule: a probabilistic scheduling DSL on TIR that unifies the approaches of AutoTVM and Auto Scheduler (Ansor). Meta schedule provides a pragmatic way to define the space of automatic tuning, extensibility in terms of all possible TIR schedule primitives like tensorization and loop partitioning, and customizability on every layer of the automation system.
+
+Meta Schedule is our 3rd generation automatic scheduling system.
+
+## 2. Motivation
+
+**Scheduling and Design Space**
+
+In TVM TensorIR, optimization of a TensorIR program is done via a sequence of transformations. For example, we reorder loops for better locality and we tensorize for specific hardware intrinsics. The process of invoking such a set of pre-defined transformations is called “**scheduling**”, and each transformation is called a “**schedule primitive**”. These primitives form a domain-specific language (DSL) describing the transformation of TensorIR programs. **Design space** is the set of all possible schedulings with respect to a TensorIR program.
+
+**Problems with the Current Scheduling System**
+
+* **Manual schedule**: Developers optimize their programs by manually invoking schedule primitives, i.e. explore points in the design space with humans in the loop. This can be a tedious and error-prone approach, hence the creation of AutoTVM and AutoScheduler (Ansor).
+* **AutoTVM**: The automation system requires users to define “schedule templates” as the design space for each operator. Therefore, it is inextensible to hundreds of operators.
+* **AutoScheduler (Ansor)**: It automatically generates schedule templates as the design space, according to a set of predefined “search rules”. However, it is non-trivial to extend AutoScheduler to new schedule primitives (tensorize, loop partition, software pipelining).
+* The three systems above have isolated sets of APIs with several layers of their own abstraction, which are not only hard to learn, but also engineering-intensive to customize.
+
+**Benefit of Meta Schedule**
+
+* Succinct syntax, consistent APIs to TensorIR schedule with no other layer of abstraction.
+* Provides unified APIs for implementing manual schedule, AutoTVM and AutoScheduler (Ansor).
+* Extensibility to all the schedule primitives, including tensorization and loop partitioning. Almost no extra effort is needed to use a new primitive in auto-tuning.
+* The automation infrastructure is customizable across every layer.
+
+## 3. Guide-level explanation
+
+In this section, we describe the syntax of meta schedule DSL, and how it could be used to describe and auto-generate the design space.
+
+### 3.1. Manual Schedule
+
+Meta schedule APIs are almost the same as TE or TensorIR scheduling. Here is an example of a manual schedule for matrix multiplication:
+
+```python
+# Designate a set of tile sizes
+i_tiles = [16, 8, 8, 8]
+j_tiles = [16, 8, 8, 8]
+k_tiles = [256, 8]
+
+# Tile the loops according to the tile sizes
+i_0, i_1, i_2, i_3 = sch.split(loop=i, factors=i_tiles)
+j_0, j_1, j_2, j_3 = sch.split(loop=j, factors=j_tiles)
+k_0, k_1           = sch.split(loop=k, factors=k_tiles)
+
+# Organize the loops into “SSRSRS” 6-level tiles
+sch.reorder(
+    i_0, j_0, # S
+    i_1, j_1, # S
+    k_0,      # R
+    i_2, j_2, # S
+    k_1,      # R
+    i_3, j_3, # S
+)
+```
+
+In this example, the developers may tweak the tile sizes and measure the performance of the generated kernels to explore the opportunities of potential optimization.
+
+Generally speaking, while writing a schedule, there are often some parameters that are hard to determine ahead of time, for example, tile sizes, unroll steps, or which tensor intrinsics to use. Developers may manually enumerate possible combinations of these unknown factors, and then pick the best schedule according to measurement results on their device.
+
+### 3.2. AutoTVM-style Design Space Description
+
+Meta schedule extends the schedule DSL with sampling instructions. When included in a schedule, these instructions parametrize the schedule from a single deterministic point to a space supported by random variables (tile size, etc.), making it possible for developers to describe the design space with meta schedule APIs.
+
+We can extend the matmul example above to cover all possible tilings using these sampling instructions:
+
+```python
+# Sample tile sizes
+i_tiles = sch.sample_perfect_tile(i, n=4)
+j_tiles = sch.sample_perfect_tile(j, n=4)
+k_tiles = sch.sample_perfect_tile(k, n=2)
+# Tile the loops according to the random variables
+i_0, i_1, i_2, i_3 = sch.split(loop=i, factors=i_tiles)
+j_0, j_1, j_2, j_3 = sch.split(loop=j, factors=j_tiles)
+k_0, k_1           = sch.split(loop=k, factors=k_tiles)
+# Organize the loops into “SSRSRS” 6-level tiles
+sch.reorder(
+    i_0, j_0, # S
+    i_1, j_1, # S
+    k_0,      # R
+    i_2, j_2, # S
+    k_1,      # R
+    i_3, j_3, # S
+)
+```
+
+### 3.3. Composite Schedule
+
+Each schedule primitive handles only a very basic operation to transform the IR, for example, `split` only splits a loop into two. In the real world, the over-fine granularity of those primitives usually leads to repetitive and verbose scheduling code, as [mentioned](https://discuss.tvm.apache.org/t/rfc-tensorir-a-schedulable-ir-for-tvm/7872/43?u=junrushao1994) by developers in our community.
+
+To counter this challenge, we allow users to register “composite schedules” that analyze the IR, and apply a set of schedule primitives correspondingly. For instance, a composite schedule may inspect a TensorIR block and decide whether we should call `compute_inline` on it. The composite schedule may use sampling instructions to fill in undecided choices.
+
+Our system also ships with some built-in composite schedules, including:
+
+* Multi-level tiling
+* Inline pure spatial blocks
+* Parallelize & vectorize & unroll
+* Auto tensorize
+* …
+
+### 3.4. AutoScheduler-style Design Space Generation
+
+AutoScheduler (Ansor) generates schedule templates by applying their SearchRules to each stage. Meta schedule treats a search rule as a composite schedule, and applies each composite schedule to each block of TensorIR to generate the design space.
+
+### 3.5. Unifying manual schedule / AutoTVM / Ansor
+
+In this section, we show that the design space induced by TE manual schedule, AutoTVM and Ansor are all subsets of meta schedule, and meta schedule further allows mixing those three styles to search jointly.
+
+**Manual schedule**. The TE schedule is a special case of a meta schedule program, where there is no randomness introduced by sampling instructions. It is a single point in terms of design space.
+
+**AutoTVM (Template-based tuning)**. Writing one or more schedule functions in meta schedule, potentially with sampling instructions, is a natural representation of AutoTVM’s schedule templates (knobs). The PPL generates one or more traces as the design space to explore.
+
+**AutoScheduler (Ansor, Template-free tuning)**. As mentioned in the previous section, application of composite schedule rules generates the design space, which is equivalent to Ansor’s sketch generation.
+
+**Mixing styles in design space definition**. By taking union of the spaces induced by the three special cases, our system allows developers to combine generic rules that Ansor provides and operator-specific scheduling.
+
+## 4. Reference-level explanation
+
+In this section, we introduce the underlying techniques for the automation system to extract and explore the design space.
+
+### 4.1. Execution trace as the design space
+
+**Trace**. To represent the design space defined by the meta schedule DSL, the underlying system records all the instructions users applied to the schedule class, including sampling and schedule primitives. We call this list of instructions a trace.
+
+Executing the example above results in the following trace:
+
+```
+Instruction 0. Sample-Perfect-Tile
+Instruction 1. Sample-Perfect-Tile
+Instruction 2. Sample-Perfect-Tile
+Instruction 3. Split
+Instruction 4. Split
+Instruction 5. Split
+Instruction 6. Reorder
+```
+
+The trace is not directly user-facing, but a data structure inside the user-facing `Schedule` class that records the execution. The automation system extracts the trace and finds out the design space according to the sampling instructions.
+
+**Union of traces**. Often a single trace is unable to represent the entire space. Therefore, more precisely, our system works on a list of traces as the union of potential design space.
+
+**Fork a trace**. When two different decisions in the scheduling process are equally important to generate high-performance schedules, we allow forking the trace into two, and the design space is the union of the forked traces.
+
+### 4.2. Exploring the Search Space
+
+Meta Schedule provides several built-in exploration strategies to exhaustively or efficiently search for efficient schedules.
+
+**Program replay**. A simple strategy that replays the schedule program that generates the PPL, and doesn’t use any advantage provided by the PPL.
+
+**Random search**. Extracts the PPL, and repetitively re-executes the PPL by flipping coins purely randomly.
+
+**Cost-model-guided evolutionary search**. A more efficient exploration strategy. We define two sets of rules:
+
+* Mutator: defines how to jump to a point’s “neighbor” in the design space
+* Postprocessor: sometimes it is non-trivial to statically determine the PPL, for example:
+  * There is a hard requirement in CUDA that the maximum number of threads should not exceed 1024, but it is a random variable that cannot be determined before actually executing the PPL. In this case, we write a postprocessor that errors out when the condition is not satisfied.
+  * The number of outer loops to be fused together depends on their extents, which are random variables. In this case, we annotate the maximum extent allowed on the block, and do actual fusion in a postprocessor.
+
+Our evolutionary search algorithm uses mutators to find possible schedules in the design space, then applies postprocessors, and asks the cost model to predict its performance. After several iterations, the new schedules with the highest scores are finally compiled and measured on device. Epsilon-greedy is used in this process to balance exploitation and exploration.
+
+### 4.3. Python first for flexibility & customizability
+
+We implement the system in a way that all levels are decoupled and open to customization, aiming at providing a playground for developers to try out new ideas and potentially deliver performance quickly.
+
+While all the important APIs are implemented in C++ for efficiency, every part of the system can be easily switched to customized python implementation. For example,
+
+**Customize design space in python**. Can be a python function that does the schedule
+
+```python
+def schedule_matmul(sch) -> sch:
+    i, j, k = sch.get_loops(sch.get_block(“matmul”))
+    i_tiles = sch.sample_perfect_tile(i, n=4)
+    j_tiles = sch.sample_perfect_tile(j, n=4)
+    k_tiles = sch.sample_perfect_tile(k, n=2)
+    # Tile the loops according to the random variables
+    i_0, i_1, i_2, i_3 = sch.split(loop=i, factors=i_tiles)
+    j_0, j_1, j_2, j_3 = sch.split(loop=j, factors=j_tiles)
+    k_0, k_1 = sch.split(loop=k, factors=k_tiles)
+    # Organize the loops into “SSRSRS” 6-level tiles
+    sch.reorder(
+        i_0, j_0, # S
+        i_1, j_1, # S
+        k_0,      # R
+        i_2, j_2, # S
+        k_1,      # R
+        i_3, j_3, # S
+    )
+    return sch
+```
+
+**Customize composite schedule in python**. We provide two ways to define a composite schedule in python:
+
+Method 1. A simple decorator that converts a python function to a composite schedule
+
+```python
+@tir.as_composite_schedule(name="multi-level-tiling")
+def multi_level_tiling(sch: Schedule, block: BlockRV) -> Union[Schedule, List[Schedule]]:
+    ...
+```
+
+Method 2. Derive from `PyCompositeSchedule`, providing extra functionalities like initialization
+
+```python
+class MultiLevelTiling(PyCompositeSchedule):
+    def initialize(...):
+        ...
+    def apply(...):
+        ...
+```
+
+**Customize exploration strategies in python**. Developers can implement any search algorithm in python as well by deriving from `PySearchPolicy`.
+
+**Other customizable components**. This list includes:
+
+* Cost model
+* Database
+* Measure callbacks
+* Feature extractor
+* Program builder & runner
+* Analysis methods
+* ...
+
+In a short summary, almost every component of the system is decoupled with each other and extensions could be easily plugged in.
+
+### 4.4. Upstreaming Plan
+
+[M3a] Core infrastructure of the PPL
+* Instruction
+* Trace
+* Composite schedule
+* Sampler
+* Search policy
+* Design space generator
+
+[M3b] Host-side search infra
+* Database
+* Cost model
+* Measure callback
+
+[M3c] RPC-related search infra
+* Measure input, build result, measure result
+* Builder
+* Runner
+
+[M4a] Implementation of rules
+* Various built-in composite schedules
+* Various built-in mutators
+* Various built-in postprocessors
+* Automatic tensorization
+
+[M4b] Relay integration
+

Review comment:
       We should have one section including documentation and tutorial. For example, how to leverage meta schedule to run auto tensorization on GPU tensor core end 2 end.

##########
File path: rfcs/0001-meta-schedule-autotensorir.md
##########
@@ -0,0 +1,297 @@
+* Feature Name: Meta Schedule (AutoTensorIR)
+* Start Date: 2021-05-28
+* RFC PR: TBD (apache/tvm-rfcs#0000)
+* GitHub Issue: TBD (apache/tvm-rfcs#0000)
+
+## 1. Summary
+
+This proposal introduces Meta Schedule: a probabilistic scheduling DSL on TIR that unifies the approaches of AutoTVM and Auto Scheduler (Ansor). Meta schedule provides a pragmatic way to define the space of automatic tuning, extensibility in terms of all possible TIR schedule primitives like tensorization and loop partitioning, and customizability on every layer of the automation system.
+
+Meta Schedule is our 3rd generation automatic scheduling system.
+
+## 2. Motivation
+
+**Scheduling and Design Space**
+
+In TVM TensorIR, optimization of a TensorIR program is done via a sequence of transformations. For example, we reorder loops for better locality and we tensorize for specific hardware intrinsics. The process of invoking such a set of pre-defined transformations is called “**scheduling**”, and each transformation is called a “**schedule primitive**”. These primitives form a domain-specific language (DSL) describing the transformation of TensorIR programs. **Design space** is the set of all possible schedulings with respect to a TensorIR program.
+
+**Problems with the Current Scheduling System**
+
+* **Manual schedule**: Developers optimize their programs by manually invoking schedule primitives, i.e. explore points in the design space with humans in the loop. This can be a tedious and error-prone approach, hence the creation of AutoTVM and AutoScheduler (Ansor).
+* **AutoTVM**: The automation system requires users to define “schedule templates” as the design space for each operator. Therefore, it is inextensible to hundreds of operators.
+* **AutoScheduler (Ansor)**: It automatically generates schedule templates as the design space, according to a set of predefined “search rules”. However, it is non-trivial to extend AutoScheduler to new schedule primitives (tensorize, loop partition, software pipelining).
+* The three systems above have isolated sets of APIs with several layers of their own abstraction, which are not only hard to learn, but also engineering-intensive to customize.
+
+**Benefit of Meta Schedule**
+
+* Succinct syntax, consistent APIs to TensorIR schedule with no other layer of abstraction.
+* Provides unified APIs for implementing manual schedule, AutoTVM and AutoScheduler (Ansor).
+* Extensibility to all the schedule primitives, including tensorization and loop partitioning. Almost no extra effort is needed to use a new primitive in auto-tuning.
+* The automation infrastructure is customizable across every layer.
+
+## 3. Guide-level explanation
+
+In this section, we describe the syntax of meta schedule DSL, and how it could be used to describe and auto-generate the design space.
+
+### 3.1. Manual Schedule
+
+Meta schedule APIs are almost the same as TE or TensorIR scheduling. Here is an example of a manual schedule for matrix multiplication:
+
+```python
+# Designate a set of tile sizes
+i_tiles = [16, 8, 8, 8]
+j_tiles = [16, 8, 8, 8]
+k_tiles = [256, 8]
+
+# Tile the loops according to the tile sizes
+i_0, i_1, i_2, i_3 = sch.split(loop=i, factors=i_tiles)
+j_0, j_1, j_2, j_3 = sch.split(loop=j, factors=j_tiles)
+k_0, k_1           = sch.split(loop=k, factors=k_tiles)
+
+# Organize the loops into “SSRSRS” 6-level tiles
+sch.reorder(
+    i_0, j_0, # S
+    i_1, j_1, # S
+    k_0,      # R
+    i_2, j_2, # S
+    k_1,      # R
+    i_3, j_3, # S
+)
+```
+
+In this example, the developers may tweak the tile sizes and measure the performance of the generated kernels to explore the opportunities of potential optimization.
+
+Generally speaking, while writing a schedule, there are often some parameters that are hard to determine ahead of time, for example, tile sizes, unroll steps, or which tensor intrinsics to use. Developers may manually enumerate possible combinations of these unknown factors, and then pick the best schedule according to measurement results on their device.
+
+### 3.2. AutoTVM-style Design Space Description
+
+Meta schedule extends the schedule DSL with sampling instructions. When included in a schedule, these instructions parametrize the schedule from a single deterministic point to a space supported by random variables (tile size, etc.), making it possible for developers to describe the design space with meta schedule APIs.
+
+We can extend the matmul example above to cover all possible tilings using these sampling instructions:
+
+```python
+# Sample tile sizes
+i_tiles = sch.sample_perfect_tile(i, n=4)
+j_tiles = sch.sample_perfect_tile(j, n=4)
+k_tiles = sch.sample_perfect_tile(k, n=2)
+# Tile the loops according to the random variables
+i_0, i_1, i_2, i_3 = sch.split(loop=i, factors=i_tiles)
+j_0, j_1, j_2, j_3 = sch.split(loop=j, factors=j_tiles)
+k_0, k_1           = sch.split(loop=k, factors=k_tiles)
+# Organize the loops into “SSRSRS” 6-level tiles
+sch.reorder(
+    i_0, j_0, # S
+    i_1, j_1, # S
+    k_0,      # R
+    i_2, j_2, # S
+    k_1,      # R
+    i_3, j_3, # S
+)
+```
+
+### 3.3. Composite Schedule
+
+Each schedule primitive handles only a very basic operation to transform the IR, for example, `split` only splits a loop into two. In the real world, the over-fine granularity of those primitives usually leads to repetitive and verbose scheduling code, as [mentioned](https://discuss.tvm.apache.org/t/rfc-tensorir-a-schedulable-ir-for-tvm/7872/43?u=junrushao1994) by developers in our community.
+
+To counter this challenge, we allow users to register “composite schedules” that analyze the IR, and apply a set of schedule primitives correspondingly. For instance, a composite schedule may inspect a TensorIR block and decide whether we should call `compute_inline` on it. The composite schedule may use sampling instructions to fill in undecided choices.
+
+Our system also ships with some built-in composite schedules, including:
+
+* Multi-level tiling
+* Inline pure spatial blocks
+* Parallelize & vectorize & unroll
+* Auto tensorize
+* …
+
+### 3.4. AutoScheduler-style Design Space Generation
+
+AutoScheduler (Ansor) generates schedule templates by applying their SearchRules to each stage. Meta schedule treats a search rule as a composite schedule, and applies each composite schedule to each block of TensorIR to generate the design space.
+
+### 3.5. Unifying manual schedule / AutoTVM / Ansor
+
+In this section, we show that the design space induced by TE manual schedule, AutoTVM and Ansor are all subsets of meta schedule, and meta schedule further allows mixing those three styles to search jointly.
+
+**Manual schedule**. The TE schedule is a special case of a meta schedule program, where there is no randomness introduced by sampling instructions. It is a single point in terms of design space.
+
+**AutoTVM (Template-based tuning)**. Writing one or more schedule functions in meta schedule, potentially with sampling instructions, is a natural representation of AutoTVM’s schedule templates (knobs). The PPL generates one or more traces as the design space to explore.
+
+**AutoScheduler (Ansor, Template-free tuning)**. As mentioned in the previous section, application of composite schedule rules generates the design space, which is equivalent to Ansor’s sketch generation.
+
+**Mixing styles in design space definition**. By taking union of the spaces induced by the three special cases, our system allows developers to combine generic rules that Ansor provides and operator-specific scheduling.
+
+## 4. Reference-level explanation
+
+In this section, we introduce the underlying techniques for the automation system to extract and explore the design space.
+
+### 4.1. Execution trace as the design space
+
+**Trace**. To represent the design space defined by the meta schedule DSL, the underlying system records all the instructions users applied to the schedule class, including sampling and schedule primitives. We call this list of instructions a trace.
+
+Executing the example above results in the following trace:
+
+```
+Instruction 0. Sample-Perfect-Tile
+Instruction 1. Sample-Perfect-Tile
+Instruction 2. Sample-Perfect-Tile
+Instruction 3. Split
+Instruction 4. Split
+Instruction 5. Split
+Instruction 6. Reorder
+```
+
+The trace is not directly user-facing, but a data structure inside the user-facing `Schedule` class that records the execution. The automation system extracts the trace and finds out the design space according to the sampling instructions.
+
+**Union of traces**. Often a single trace is unable to represent the entire space. Therefore, more precisely, our system works on a list of traces as the union of potential design space.
+
+**Fork a trace**. When two different decisions in the scheduling process are equally important to generate high-performance schedules, we allow forking the trace into two, and the design space is the union of the forked traces.
+
+### 4.2. Exploring the Search Space
+
+Meta Schedule provides several built-in exploration strategies to exhaustively or efficiently search for efficient schedules.
+
+**Program replay**. A simple strategy that replays the schedule program that generates the PPL, and doesn’t use any advantage provided by the PPL.
+
+**Random search**. Extracts the PPL, and repetitively re-executes the PPL by flipping coins purely randomly.
+
+**Cost-model-guided evolutionary search**. A more efficient exploration strategy. We define two sets of rules:
+
+* Mutator: defines how to jump to a point’s “neighbor” in the design space
+* Postprocessor: sometimes it is non-trivial to statically determine the PPL, for example:
+  * There is a hard requirement in CUDA that the maximum number of threads should not exceed 1024, but it is a random variable that cannot be determined before actually executing the PPL. In this case, we write a postprocessor that errors out when the condition is not satisfied.
+  * The number of outer loops to be fused together depends on their extents, which are random variables. In this case, we annotate the maximum extent allowed on the block, and do actual fusion in a postprocessor.
+
+Our evolutionary search algorithm uses mutators to find possible schedules in the design space, then applies postprocessors, and asks the cost model to predict its performance. After several iterations, the new schedules with the highest scores are finally compiled and measured on device. Epsilon-greedy is used in this process to balance exploitation and exploration.
+
+### 4.3. Python first for flexibility & customizability
+
+We implement the system in a way that all levels are decoupled and open to customization, aiming at providing a playground for developers to try out new ideas and potentially deliver performance quickly.
+
+While all the important APIs are implemented in C++ for efficiency, every part of the system can be easily switched to customized python implementation. For example,
+
+**Customize design space in python**. Can be a python function that does the schedule
+
+```python
+def schedule_matmul(sch) -> sch:
+    i, j, k = sch.get_loops(sch.get_block(“matmul”))
+    i_tiles = sch.sample_perfect_tile(i, n=4)
+    j_tiles = sch.sample_perfect_tile(j, n=4)
+    k_tiles = sch.sample_perfect_tile(k, n=2)
+    # Tile the loops according to the random variables
+    i_0, i_1, i_2, i_3 = sch.split(loop=i, factors=i_tiles)
+    j_0, j_1, j_2, j_3 = sch.split(loop=j, factors=j_tiles)
+    k_0, k_1 = sch.split(loop=k, factors=k_tiles)
+    # Organize the loops into “SSRSRS” 6-level tiles
+    sch.reorder(
+        i_0, j_0, # S
+        i_1, j_1, # S
+        k_0,      # R
+        i_2, j_2, # S
+        k_1,      # R
+        i_3, j_3, # S
+    )
+    return sch
+```
+
+**Customize composite schedule in python**. We provide two ways to define a composite schedule in python:
+
+Method 1. A simple decorator that converts a python function to a composite schedule
+
+```python
+@tir.as_composite_schedule(name="multi-level-tiling")
+def multi_level_tiling(sch: Schedule, block: BlockRV) -> Union[Schedule, List[Schedule]]:
+    ...
+```
+
+Method 2. Derive from `PyCompositeSchedule`, providing extra functionalities like initialization
+
+```python
+class MultiLevelTiling(PyCompositeSchedule):
+    def initialize(...):
+        ...
+    def apply(...):
+        ...
+```
+
+**Customize exploration strategies in python**. Developers can implement any search algorithm in python as well by deriving from `PySearchPolicy`.

Review comment:
       One simple code example is nicer.

##########
File path: rfcs/0001-meta-schedule-autotensorir.md
##########
@@ -0,0 +1,297 @@
+* Feature Name: Meta Schedule (AutoTensorIR)
+* Start Date: 2021-05-28
+* RFC PR: TBD (apache/tvm-rfcs#0000)
+* GitHub Issue: TBD (apache/tvm-rfcs#0000)
+
+## 1. Summary
+
+This proposal introduces Meta Schedule: a probabilistic scheduling DSL on TIR that unifies the approaches of AutoTVM and Auto Scheduler (Ansor). Meta schedule provides a pragmatic way to define the space of automatic tuning, extensibility in terms of all possible TIR schedule primitives like tensorization and loop partitioning, and customizability on every layer of the automation system.
+
+Meta Schedule is our 3rd generation automatic scheduling system.
+
+## 2. Motivation
+
+**Scheduling and Design Space**
+
+In TVM TensorIR, optimization of a TensorIR program is done via a sequence of transformations. For example, we reorder loops for better locality and we tensorize for specific hardware intrinsics. The process of invoking such a set of pre-defined transformations is called “**scheduling**”, and each transformation is called a “**schedule primitive**”. These primitives form a domain-specific language (DSL) describing the transformation of TensorIR programs. **Design space** is the set of all possible schedulings with respect to a TensorIR program.
+
+**Problems with the Current Scheduling System**
+
+* **Manual schedule**: Developers optimize their programs by manually invoking schedule primitives, i.e. explore points in the design space with humans in the loop. This can be a tedious and error-prone approach, hence the creation of AutoTVM and AutoScheduler (Ansor).
+* **AutoTVM**: The automation system requires users to define “schedule templates” as the design space for each operator. Therefore, it is inextensible to hundreds of operators.
+* **AutoScheduler (Ansor)**: It automatically generates schedule templates as the design space, according to a set of predefined “search rules”. However, it is non-trivial to extend AutoScheduler to new schedule primitives (tensorize, loop partition, software pipelining).
+* The three systems above have isolated sets of APIs with several layers of their own abstraction, which are not only hard to learn, but also engineering-intensive to customize.
+
+**Benefit of Meta Schedule**
+
+* Succinct syntax, consistent APIs to TensorIR schedule with no other layer of abstraction.
+* Provides unified APIs for implementing manual schedule, AutoTVM and AutoScheduler (Ansor).
+* Extensibility to all the schedule primitives, including tensorization and loop partitioning. Almost no extra effort is needed to use a new primitive in auto-tuning.
+* The automation infrastructure is customizable across every layer.
+
+## 3. Guide-level explanation
+
+In this section, we describe the syntax of meta schedule DSL, and how it could be used to describe and auto-generate the design space.
+
+### 3.1. Manual Schedule
+
+Meta schedule APIs are almost the same as TE or TensorIR scheduling. Here is an example of a manual schedule for matrix multiplication:
+
+```python
+# Designate a set of tile sizes
+i_tiles = [16, 8, 8, 8]
+j_tiles = [16, 8, 8, 8]
+k_tiles = [256, 8]
+
+# Tile the loops according to the tile sizes
+i_0, i_1, i_2, i_3 = sch.split(loop=i, factors=i_tiles)
+j_0, j_1, j_2, j_3 = sch.split(loop=j, factors=j_tiles)
+k_0, k_1           = sch.split(loop=k, factors=k_tiles)
+
+# Organize the loops into “SSRSRS” 6-level tiles
+sch.reorder(
+    i_0, j_0, # S
+    i_1, j_1, # S
+    k_0,      # R
+    i_2, j_2, # S
+    k_1,      # R
+    i_3, j_3, # S
+)
+```
+
+In this example, the developers may tweak the tile sizes and measure the performance of the generated kernels to explore the opportunities of potential optimization.
+
+Generally speaking, while writing a schedule, there are often some parameters that are hard to determine ahead of time, for example, tile sizes, unroll steps, or which tensor intrinsics to use. Developers may manually enumerate possible combinations of these unknown factors, and then pick the best schedule according to measurement results on their device.
+
+### 3.2. AutoTVM-style Design Space Description
+
+Meta schedule extends the schedule DSL with sampling instructions. When included in a schedule, these instructions parametrize the schedule from a single deterministic point to a space supported by random variables (tile size, etc.), making it possible for developers to describe the design space with meta schedule APIs.
+
+We can extend the matmul example above to cover all possible tilings using these sampling instructions:
+
+```python
+# Sample tile sizes
+i_tiles = sch.sample_perfect_tile(i, n=4)
+j_tiles = sch.sample_perfect_tile(j, n=4)
+k_tiles = sch.sample_perfect_tile(k, n=2)
+# Tile the loops according to the random variables
+i_0, i_1, i_2, i_3 = sch.split(loop=i, factors=i_tiles)
+j_0, j_1, j_2, j_3 = sch.split(loop=j, factors=j_tiles)
+k_0, k_1           = sch.split(loop=k, factors=k_tiles)
+# Organize the loops into “SSRSRS” 6-level tiles
+sch.reorder(
+    i_0, j_0, # S
+    i_1, j_1, # S
+    k_0,      # R
+    i_2, j_2, # S
+    k_1,      # R
+    i_3, j_3, # S
+)
+```
+
+### 3.3. Composite Schedule
+
+Each schedule primitive handles only a very basic operation to transform the IR, for example, `split` only splits a loop into two. In the real world, the over-fine granularity of those primitives usually leads to repetitive and verbose scheduling code, as [mentioned](https://discuss.tvm.apache.org/t/rfc-tensorir-a-schedulable-ir-for-tvm/7872/43?u=junrushao1994) by developers in our community.
+
+To counter this challenge, we allow users to register “composite schedules” that analyze the IR, and apply a set of schedule primitives correspondingly. For instance, a composite schedule may inspect a TensorIR block and decide whether we should call `compute_inline` on it. The composite schedule may use sampling instructions to fill in undecided choices.
+
+Our system also ships with some built-in composite schedules, including:
+
+* Multi-level tiling
+* Inline pure spatial blocks
+* Parallelize & vectorize & unroll
+* Auto tensorize
+* …
+
+### 3.4. AutoScheduler-style Design Space Generation
+
+AutoScheduler (Ansor) generates schedule templates by applying their SearchRules to each stage. Meta schedule treats a search rule as a composite schedule, and applies each composite schedule to each block of TensorIR to generate the design space.
+
+### 3.5. Unifying manual schedule / AutoTVM / Ansor
+
+In this section, we show that the design space induced by TE manual schedule, AutoTVM and Ansor are all subsets of meta schedule, and meta schedule further allows mixing those three styles to search jointly.
+
+**Manual schedule**. The TE schedule is a special case of a meta schedule program, where there is no randomness introduced by sampling instructions. It is a single point in terms of design space.
+
+**AutoTVM (Template-based tuning)**. Writing one or more schedule functions in meta schedule, potentially with sampling instructions, is a natural representation of AutoTVM’s schedule templates (knobs). The PPL generates one or more traces as the design space to explore.
+
+**AutoScheduler (Ansor, Template-free tuning)**. As mentioned in the previous section, application of composite schedule rules generates the design space, which is equivalent to Ansor’s sketch generation.
+
+**Mixing styles in design space definition**. By taking union of the spaces induced by the three special cases, our system allows developers to combine generic rules that Ansor provides and operator-specific scheduling.
+
+## 4. Reference-level explanation
+
+In this section, we introduce the underlying techniques for the automation system to extract and explore the design space.
+
+### 4.1. Execution trace as the design space
+
+**Trace**. To represent the design space defined by the meta schedule DSL, the underlying system records all the instructions users applied to the schedule class, including sampling and schedule primitives. We call this list of instructions a trace.
+
+Executing the example above results in the following trace:
+
+```
+Instruction 0. Sample-Perfect-Tile
+Instruction 1. Sample-Perfect-Tile
+Instruction 2. Sample-Perfect-Tile
+Instruction 3. Split
+Instruction 4. Split
+Instruction 5. Split
+Instruction 6. Reorder
+```
+
+The trace is not directly user-facing, but a data structure inside the user-facing `Schedule` class that records the execution. The automation system extracts the trace and finds out the design space according to the sampling instructions.
+
+**Union of traces**. Often a single trace is unable to represent the entire space. Therefore, more precisely, our system works on a list of traces as the union of potential design space.
+
+**Fork a trace**. When two different decisions in the scheduling process are equally important to generate high-performance schedules, we allow forking the trace into two, and the design space is the union of the forked traces.
+
+### 4.2. Exploring the Search Space
+
+Meta Schedule provides several built-in exploration strategies to exhaustively or efficiently search for efficient schedules.
+
+**Program replay**. A simple strategy that replays the schedule program that generates the PPL, and doesn’t use any advantage provided by the PPL.
+
+**Random search**. Extracts the PPL, and repetitively re-executes the PPL by flipping coins purely randomly.
+
+**Cost-model-guided evolutionary search**. A more efficient exploration strategy. We define two sets of rules:
+
+* Mutator: defines how to jump to a point’s “neighbor” in the design space
+* Postprocessor: sometimes it is non-trivial to statically determine the PPL, for example:
+  * There is a hard requirement in CUDA that the maximum number of threads should not exceed 1024, but it is a random variable that cannot be determined before actually executing the PPL. In this case, we write a postprocessor that errors out when the condition is not satisfied.
+  * The number of outer loops to be fused together depends on their extents, which are random variables. In this case, we annotate the maximum extent allowed on the block, and do actual fusion in a postprocessor.
+
+Our evolutionary search algorithm uses mutators to find possible schedules in the design space, then applies postprocessors, and asks the cost model to predict its performance. After several iterations, the new schedules with the highest scores are finally compiled and measured on device. Epsilon-greedy is used in this process to balance exploitation and exploration.
+
+### 4.3. Python first for flexibility & customizability
+
+We implement the system in a way that all levels are decoupled and open to customization, aiming at providing a playground for developers to try out new ideas and potentially deliver performance quickly.
+
+While all the important APIs are implemented in C++ for efficiency, every part of the system can be easily switched to customized python implementation. For example,
+
+**Customize design space in python**. Can be a python function that does the schedule
+
+```python
+def schedule_matmul(sch) -> sch:
+    i, j, k = sch.get_loops(sch.get_block(“matmul”))
+    i_tiles = sch.sample_perfect_tile(i, n=4)
+    j_tiles = sch.sample_perfect_tile(j, n=4)
+    k_tiles = sch.sample_perfect_tile(k, n=2)
+    # Tile the loops according to the random variables
+    i_0, i_1, i_2, i_3 = sch.split(loop=i, factors=i_tiles)
+    j_0, j_1, j_2, j_3 = sch.split(loop=j, factors=j_tiles)
+    k_0, k_1 = sch.split(loop=k, factors=k_tiles)
+    # Organize the loops into “SSRSRS” 6-level tiles
+    sch.reorder(
+        i_0, j_0, # S
+        i_1, j_1, # S
+        k_0,      # R
+        i_2, j_2, # S
+        k_1,      # R
+        i_3, j_3, # S
+    )
+    return sch
+```
+
+**Customize composite schedule in python**. We provide two ways to define a composite schedule in python:
+
+Method 1. A simple decorator that converts a python function to a composite schedule
+
+```python
+@tir.as_composite_schedule(name="multi-level-tiling")
+def multi_level_tiling(sch: Schedule, block: BlockRV) -> Union[Schedule, List[Schedule]]:
+    ...
+```
+
+Method 2. Derive from `PyCompositeSchedule`, providing extra functionalities like initialization
+
+```python
+class MultiLevelTiling(PyCompositeSchedule):
+    def initialize(...):
+        ...
+    def apply(...):
+        ...
+```
+
+**Customize exploration strategies in python**. Developers can implement any search algorithm in python as well by deriving from `PySearchPolicy`.
+
+**Other customizable components**. This list includes:
+
+* Cost model
+* Database
+* Measure callbacks
+* Feature extractor
+* Program builder & runner
+* Analysis methods
+* ...
+
+In a short summary, almost every component of the system is decoupled with each other and extensions could be easily plugged in.
+
+### 4.4. Upstreaming Plan
+
+[M3a] Core infrastructure of the PPL
+* Instruction
+* Trace
+* Composite schedule
+* Sampler
+* Search policy
+* Design space generator
+
+[M3b] Host-side search infra
+* Database
+* Cost model
+* Measure callback
+
+[M3c] RPC-related search infra
+* Measure input, build result, measure result
+* Builder
+* Runner
+
+[M4a] Implementation of rules
+* Various built-in composite schedules
+* Various built-in mutators
+* Various built-in postprocessors
+* Automatic tensorization
+
+[M4b] Relay integration
+
+## 5. Drawbacks
+
+We are not aware of any drawbacks of the proposed system.
+
+## 6. Rationale and alternatives
+
+The system is designed with the principle of minimalism: different from alternative solutions, we do not require any change in existing codebase, or extra APIs to learn. It could potentially lower the bar of using automation systems. 
+
+Unifying manual scheduling, AutoTVM's semi automatic templates and AutoScheduler's (Ansor's) fully automatic sketch generation provides flexible way to balance injection new domain knowledge and automation.
+
+Flexibility in customization allows quick try-out on new tasks, new strategies and new hardware targets without deep knowledge of the system.
+
+## 7. Prior art
+
+**Tensor Expression (TE)** in TVM is a DSL that decouples compute and schedule, which provides convenient ways to handcraft optimized kernels for different hardware targets.
+
+**TensorIR** is the latest generation of TVM’s low-level IR. Its capability of eagerly applying schedule primitives opens the door for meta schedule, our proposed new-generation auto scheduling system.
+
+**AutoTVM** is the 1st generation automation framework in TVM, which requires developers to implement per-operator scheduling templates, and the system could handle the tuning process.
+
+**AutoScheduler (Ansor)** is the 2nd generation automation framework in TVM, whose built-in rules could automatically generate schedule templates for almost all the operators on CPU, GPU, etc.
+
+## 8. Unresolved questions
+
+**Supporting Control Flow and Assertions**
+
+Right now the meta schedule DSL does not support control flow. Although we didn’t see any real-world use case right now, it is possible that it could appear in some future workloads.

Review comment:
       The Autonomous Driving Algorithm has many control flow in the post process network. I want to know what is the definition and limitation of `control flow`? For example, if we could support `tf.cond` in relay completely, how will meta schedule to handle the next step?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tvm-rfcs] junrushao1994 commented on a change in pull request #5: [RFC] Meta Schedule (AutoTIR)

Posted by GitBox <gi...@apache.org>.
junrushao1994 commented on a change in pull request #5:
URL: https://github.com/apache/tvm-rfcs/pull/5#discussion_r668238062



##########
File path: rfcs/0001-meta-schedule-autotensorir.md
##########
@@ -0,0 +1,444 @@
+<!--- Licensed to the Apache Software Foundation (ASF) under one -->
+<!--- or more contributor license agreements.  See the NOTICE file -->
+<!--- distributed with this work for additional information -->
+<!--- regarding copyright ownership.  The ASF licenses this file -->
+<!--- to you under the Apache License, Version 2.0 (the -->
+<!--- "License"); you may not use this file except in compliance -->
+<!--- with the License.  You may obtain a copy of the License at -->
+
+<!---   http://www.apache.org/licenses/LICENSE-2.0 -->
+
+<!--- Unless required by applicable law or agreed to in writing, -->
+<!--- software distributed under the License is distributed on an -->
+<!--- "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY -->
+<!--- KIND, either express or implied.  See the License for the -->
+<!--- specific language governing permissions and limitations -->
+<!--- under the License. -->
+
+* Feature Name: Meta Schedule (Formerly AutoTIR)
+* Start Date: 2021-05-28
+* RFC PR: TBD (apache/tvm-rfcs#0000)
+* GitHub Issue: TBD (apache/tvm-rfcs#0000)
+
+## 1. Summary
+
+This proposal introduces Meta Schedule: a probabilistic scheduling DSL on TIR that unifies the
+approaches of AutoTVM and Auto Scheduler (Ansor). Meta schedule provides a pragmatic way to define
+the space of automatic tuning, extensibility in terms of all possible TIR schedule primitives like
+tensorization and loop partitioning, and customizability on every layer of the automation system.
+
+Meta Schedule is our 3rd generation automatic scheduling system.
+
+## 2. Motivation
+
+**Scheduling and design space.** In TVM TensorIR, optimization of a TensorIR program is done via a
+sequence of transformations. For example, we reorder loops for better locality and we tensorize for
+specific hardware intrinsics. The process of invoking such a set of pre-defined transformations is
+called "**scheduling**", and each transformation is called a "**schedule primitive**". These
+primitives form a domain-specific language (DSL) describing the transformation of TensorIR programs.
+**Design space** is the set of all possible schedulings with respect to a TensorIR program.
+
+**Problems with the current scheduling system.** Currently we have 3 sets of scheduling APIs:
+* **Manual schedule**: Developers optimize their programs by manually invoking schedule primitives,
+  i.e. explore points in the design space with humans in the loop. This can be a tedious and
+  error-prone approach, hence the creation of AutoTVM and AutoScheduler (Ansor).
+* **AutoTVM**: The automation system requires users to define "schedule templates" as the design
+  space for each operator. Therefore, it is inextensible to hundreds of operators and variety
+  hardware platforms.
+* **AutoScheduler (Ansor)**: It automatically generates schedule templates as the design space,
+  according to a set of predefined "search rules". However, it is non-trivial to extend
+  AutoScheduler to new schedule primitives (tensorize, loop partition, software pipelining, etc).
+* The three systems above have isolated sets of APIs with several layers of their own abstraction,
+  which are not only hard to learn, but also engineering-intensive to customize.
+
+**Benefits of Meta Schedule.**  Meta schedule provides:
+* Succinct syntax, consistent APIs to TensorIR schedule with no other layer of abstraction.
+* Unified APIs for implementing manual schedules, AutoTVM-style schedules, and AutoScheduler-style
+  schedules.
+* Extensibility of all the schedule primitives, including tensorization and loop partitioning.
+  Almost no extra effort is needed to use a new primitive in auto-tuning.
+* The automation infrastructure is extensible on every of its components. Every component of the
+  system can be customized easily in pure python or C++ or both. For example, one can develop a new
+  design space generator in python, a new ProgramRunner in python, etc.
+
+
+## 3. Guide-level explanation
+
+In this section, we describe the syntax of meta schedule DSL, and how it could be used to describe
+and auto-generate the design space.

Review comment:
       It is great idea. Will explain and enumerate the content in this paragraph




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tvm-rfcs] FrozenGene commented on a change in pull request #5: [RFC] Meta Schedule (AutoTIR)

Posted by GitBox <gi...@apache.org>.
FrozenGene commented on a change in pull request #5:
URL: https://github.com/apache/tvm-rfcs/pull/5#discussion_r648064710



##########
File path: rfcs/0001-meta-schedule-autotensorir.md
##########
@@ -0,0 +1,297 @@
+* Feature Name: Meta Schedule (AutoTensorIR)
+* Start Date: 2021-05-28
+* RFC PR: TBD (apache/tvm-rfcs#0000)
+* GitHub Issue: TBD (apache/tvm-rfcs#0000)
+
+## 1. Summary
+
+This proposal introduces Meta Schedule: a probabilistic scheduling DSL on TIR that unifies the approaches of AutoTVM and Auto Scheduler (Ansor). Meta schedule provides a pragmatic way to define the space of automatic tuning, extensibility in terms of all possible TIR schedule primitives like tensorization and loop partitioning, and customizability on every layer of the automation system.
+
+Meta Schedule is our 3rd generation automatic scheduling system.
+
+## 2. Motivation
+
+**Scheduling and Design Space**
+
+In TVM TensorIR, optimization of a TensorIR program is done via a sequence of transformations. For example, we reorder loops for better locality and we tensorize for specific hardware intrinsics. The process of invoking such a set of pre-defined transformations is called “**scheduling**”, and each transformation is called a “**schedule primitive**”. These primitives form a domain-specific language (DSL) describing the transformation of TensorIR programs. **Design space** is the set of all possible schedulings with respect to a TensorIR program.
+
+**Problems with the Current Scheduling System**
+
+* **Manual schedule**: Developers optimize their programs by manually invoking schedule primitives, i.e. explore points in the design space with humans in the loop. This can be a tedious and error-prone approach, hence the creation of AutoTVM and AutoScheduler (Ansor).
+* **AutoTVM**: The automation system requires users to define “schedule templates” as the design space for each operator. Therefore, it is inextensible to hundreds of operators.

Review comment:
       `and variety hardware platforms`

##########
File path: rfcs/0001-meta-schedule-autotensorir.md
##########
@@ -0,0 +1,297 @@
+* Feature Name: Meta Schedule (AutoTensorIR)
+* Start Date: 2021-05-28
+* RFC PR: TBD (apache/tvm-rfcs#0000)
+* GitHub Issue: TBD (apache/tvm-rfcs#0000)
+
+## 1. Summary
+
+This proposal introduces Meta Schedule: a probabilistic scheduling DSL on TIR that unifies the approaches of AutoTVM and Auto Scheduler (Ansor). Meta schedule provides a pragmatic way to define the space of automatic tuning, extensibility in terms of all possible TIR schedule primitives like tensorization and loop partitioning, and customizability on every layer of the automation system.
+
+Meta Schedule is our 3rd generation automatic scheduling system.
+
+## 2. Motivation
+
+**Scheduling and Design Space**
+
+In TVM TensorIR, optimization of a TensorIR program is done via a sequence of transformations. For example, we reorder loops for better locality and we tensorize for specific hardware intrinsics. The process of invoking such a set of pre-defined transformations is called “**scheduling**”, and each transformation is called a “**schedule primitive**”. These primitives form a domain-specific language (DSL) describing the transformation of TensorIR programs. **Design space** is the set of all possible schedulings with respect to a TensorIR program.
+
+**Problems with the Current Scheduling System**
+
+* **Manual schedule**: Developers optimize their programs by manually invoking schedule primitives, i.e. explore points in the design space with humans in the loop. This can be a tedious and error-prone approach, hence the creation of AutoTVM and AutoScheduler (Ansor).
+* **AutoTVM**: The automation system requires users to define “schedule templates” as the design space for each operator. Therefore, it is inextensible to hundreds of operators.
+* **AutoScheduler (Ansor)**: It automatically generates schedule templates as the design space, according to a set of predefined “search rules”. However, it is non-trivial to extend AutoScheduler to new schedule primitives (tensorize, loop partition, software pipelining).
+* The three systems above have isolated sets of APIs with several layers of their own abstraction, which are not only hard to learn, but also engineering-intensive to customize.
+
+**Benefit of Meta Schedule**
+
+* Succinct syntax, consistent APIs to TensorIR schedule with no other layer of abstraction.
+* Provides unified APIs for implementing manual schedule, AutoTVM and AutoScheduler (Ansor).
+* Extensibility to all the schedule primitives, including tensorization and loop partitioning. Almost no extra effort is needed to use a new primitive in auto-tuning.
+* The automation infrastructure is customizable across every layer.

Review comment:
       I don't understand what is the meaning of `customizable across every layer`. Could you explain a bit more?

##########
File path: rfcs/0001-meta-schedule-autotensorir.md
##########
@@ -0,0 +1,297 @@
+* Feature Name: Meta Schedule (AutoTensorIR)
+* Start Date: 2021-05-28
+* RFC PR: TBD (apache/tvm-rfcs#0000)
+* GitHub Issue: TBD (apache/tvm-rfcs#0000)
+
+## 1. Summary
+
+This proposal introduces Meta Schedule: a probabilistic scheduling DSL on TIR that unifies the approaches of AutoTVM and Auto Scheduler (Ansor). Meta schedule provides a pragmatic way to define the space of automatic tuning, extensibility in terms of all possible TIR schedule primitives like tensorization and loop partitioning, and customizability on every layer of the automation system.
+
+Meta Schedule is our 3rd generation automatic scheduling system.
+
+## 2. Motivation
+
+**Scheduling and Design Space**
+
+In TVM TensorIR, optimization of a TensorIR program is done via a sequence of transformations. For example, we reorder loops for better locality and we tensorize for specific hardware intrinsics. The process of invoking such a set of pre-defined transformations is called “**scheduling**”, and each transformation is called a “**schedule primitive**”. These primitives form a domain-specific language (DSL) describing the transformation of TensorIR programs. **Design space** is the set of all possible schedulings with respect to a TensorIR program.
+
+**Problems with the Current Scheduling System**
+
+* **Manual schedule**: Developers optimize their programs by manually invoking schedule primitives, i.e. explore points in the design space with humans in the loop. This can be a tedious and error-prone approach, hence the creation of AutoTVM and AutoScheduler (Ansor).
+* **AutoTVM**: The automation system requires users to define “schedule templates” as the design space for each operator. Therefore, it is inextensible to hundreds of operators.
+* **AutoScheduler (Ansor)**: It automatically generates schedule templates as the design space, according to a set of predefined “search rules”. However, it is non-trivial to extend AutoScheduler to new schedule primitives (tensorize, loop partition, software pipelining).
+* The three systems above have isolated sets of APIs with several layers of their own abstraction, which are not only hard to learn, but also engineering-intensive to customize.
+
+**Benefit of Meta Schedule**
+
+* Succinct syntax, consistent APIs to TensorIR schedule with no other layer of abstraction.
+* Provides unified APIs for implementing manual schedule, AutoTVM and AutoScheduler (Ansor).
+* Extensibility to all the schedule primitives, including tensorization and loop partitioning. Almost no extra effort is needed to use a new primitive in auto-tuning.
+* The automation infrastructure is customizable across every layer.
+
+## 3. Guide-level explanation
+
+In this section, we describe the syntax of meta schedule DSL, and how it could be used to describe and auto-generate the design space.
+
+### 3.1. Manual Schedule
+
+Meta schedule APIs are almost the same as TE or TensorIR scheduling. Here is an example of a manual schedule for matrix multiplication:
+
+```python
+# Designate a set of tile sizes
+i_tiles = [16, 8, 8, 8]
+j_tiles = [16, 8, 8, 8]
+k_tiles = [256, 8]
+
+# Tile the loops according to the tile sizes
+i_0, i_1, i_2, i_3 = sch.split(loop=i, factors=i_tiles)
+j_0, j_1, j_2, j_3 = sch.split(loop=j, factors=j_tiles)
+k_0, k_1           = sch.split(loop=k, factors=k_tiles)
+
+# Organize the loops into “SSRSRS” 6-level tiles

Review comment:
       I think we should add one simple note to explain what is `S` (what is `spatial`) and `R` (what is `reduce`).

##########
File path: rfcs/0001-meta-schedule-autotensorir.md
##########
@@ -0,0 +1,297 @@
+* Feature Name: Meta Schedule (AutoTensorIR)
+* Start Date: 2021-05-28
+* RFC PR: TBD (apache/tvm-rfcs#0000)
+* GitHub Issue: TBD (apache/tvm-rfcs#0000)
+
+## 1. Summary
+
+This proposal introduces Meta Schedule: a probabilistic scheduling DSL on TIR that unifies the approaches of AutoTVM and Auto Scheduler (Ansor). Meta schedule provides a pragmatic way to define the space of automatic tuning, extensibility in terms of all possible TIR schedule primitives like tensorization and loop partitioning, and customizability on every layer of the automation system.
+
+Meta Schedule is our 3rd generation automatic scheduling system.
+
+## 2. Motivation
+
+**Scheduling and Design Space**
+
+In TVM TensorIR, optimization of a TensorIR program is done via a sequence of transformations. For example, we reorder loops for better locality and we tensorize for specific hardware intrinsics. The process of invoking such a set of pre-defined transformations is called “**scheduling**”, and each transformation is called a “**schedule primitive**”. These primitives form a domain-specific language (DSL) describing the transformation of TensorIR programs. **Design space** is the set of all possible schedulings with respect to a TensorIR program.
+
+**Problems with the Current Scheduling System**
+
+* **Manual schedule**: Developers optimize their programs by manually invoking schedule primitives, i.e. explore points in the design space with humans in the loop. This can be a tedious and error-prone approach, hence the creation of AutoTVM and AutoScheduler (Ansor).
+* **AutoTVM**: The automation system requires users to define “schedule templates” as the design space for each operator. Therefore, it is inextensible to hundreds of operators.
+* **AutoScheduler (Ansor)**: It automatically generates schedule templates as the design space, according to a set of predefined “search rules”. However, it is non-trivial to extend AutoScheduler to new schedule primitives (tensorize, loop partition, software pipelining).

Review comment:
       to new schedule primitives (tensorize, loop partition, software pipelining `and so on`).

##########
File path: rfcs/0001-meta-schedule-autotensorir.md
##########
@@ -0,0 +1,297 @@
+* Feature Name: Meta Schedule (AutoTensorIR)
+* Start Date: 2021-05-28
+* RFC PR: TBD (apache/tvm-rfcs#0000)
+* GitHub Issue: TBD (apache/tvm-rfcs#0000)
+
+## 1. Summary
+
+This proposal introduces Meta Schedule: a probabilistic scheduling DSL on TIR that unifies the approaches of AutoTVM and Auto Scheduler (Ansor). Meta schedule provides a pragmatic way to define the space of automatic tuning, extensibility in terms of all possible TIR schedule primitives like tensorization and loop partitioning, and customizability on every layer of the automation system.
+
+Meta Schedule is our 3rd generation automatic scheduling system.
+
+## 2. Motivation
+
+**Scheduling and Design Space**
+
+In TVM TensorIR, optimization of a TensorIR program is done via a sequence of transformations. For example, we reorder loops for better locality and we tensorize for specific hardware intrinsics. The process of invoking such a set of pre-defined transformations is called “**scheduling**”, and each transformation is called a “**schedule primitive**”. These primitives form a domain-specific language (DSL) describing the transformation of TensorIR programs. **Design space** is the set of all possible schedulings with respect to a TensorIR program.
+
+**Problems with the Current Scheduling System**
+
+* **Manual schedule**: Developers optimize their programs by manually invoking schedule primitives, i.e. explore points in the design space with humans in the loop. This can be a tedious and error-prone approach, hence the creation of AutoTVM and AutoScheduler (Ansor).
+* **AutoTVM**: The automation system requires users to define “schedule templates” as the design space for each operator. Therefore, it is inextensible to hundreds of operators.
+* **AutoScheduler (Ansor)**: It automatically generates schedule templates as the design space, according to a set of predefined “search rules”. However, it is non-trivial to extend AutoScheduler to new schedule primitives (tensorize, loop partition, software pipelining).
+* The three systems above have isolated sets of APIs with several layers of their own abstraction, which are not only hard to learn, but also engineering-intensive to customize.
+
+**Benefit of Meta Schedule**
+
+* Succinct syntax, consistent APIs to TensorIR schedule with no other layer of abstraction.
+* Provides unified APIs for implementing manual schedule, AutoTVM and AutoScheduler (Ansor).
+* Extensibility to all the schedule primitives, including tensorization and loop partitioning. Almost no extra effort is needed to use a new primitive in auto-tuning.
+* The automation infrastructure is customizable across every layer.
+
+## 3. Guide-level explanation
+
+In this section, we describe the syntax of meta schedule DSL, and how it could be used to describe and auto-generate the design space.
+
+### 3.1. Manual Schedule
+
+Meta schedule APIs are almost the same as TE or TensorIR scheduling. Here is an example of a manual schedule for matrix multiplication:
+
+```python
+# Designate a set of tile sizes
+i_tiles = [16, 8, 8, 8]
+j_tiles = [16, 8, 8, 8]
+k_tiles = [256, 8]
+
+# Tile the loops according to the tile sizes
+i_0, i_1, i_2, i_3 = sch.split(loop=i, factors=i_tiles)
+j_0, j_1, j_2, j_3 = sch.split(loop=j, factors=j_tiles)
+k_0, k_1           = sch.split(loop=k, factors=k_tiles)
+
+# Organize the loops into “SSRSRS” 6-level tiles
+sch.reorder(
+    i_0, j_0, # S
+    i_1, j_1, # S
+    k_0,      # R
+    i_2, j_2, # S
+    k_1,      # R
+    i_3, j_3, # S
+)
+```
+
+In this example, the developers may tweak the tile sizes and measure the performance of the generated kernels to explore the opportunities of potential optimization.
+
+Generally speaking, while writing a schedule, there are often some parameters that are hard to determine ahead of time, for example, tile sizes, unroll steps, or which tensor intrinsics to use. Developers may manually enumerate possible combinations of these unknown factors, and then pick the best schedule according to measurement results on their device.
+
+### 3.2. AutoTVM-style Design Space Description
+
+Meta schedule extends the schedule DSL with sampling instructions. When included in a schedule, these instructions parametrize the schedule from a single deterministic point to a space supported by random variables (tile size, etc.), making it possible for developers to describe the design space with meta schedule APIs.
+
+We can extend the matmul example above to cover all possible tilings using these sampling instructions:
+
+```python
+# Sample tile sizes
+i_tiles = sch.sample_perfect_tile(i, n=4)
+j_tiles = sch.sample_perfect_tile(j, n=4)
+k_tiles = sch.sample_perfect_tile(k, n=2)
+# Tile the loops according to the random variables
+i_0, i_1, i_2, i_3 = sch.split(loop=i, factors=i_tiles)
+j_0, j_1, j_2, j_3 = sch.split(loop=j, factors=j_tiles)
+k_0, k_1           = sch.split(loop=k, factors=k_tiles)
+# Organize the loops into “SSRSRS” 6-level tiles
+sch.reorder(
+    i_0, j_0, # S
+    i_1, j_1, # S
+    k_0,      # R
+    i_2, j_2, # S
+    k_1,      # R
+    i_3, j_3, # S
+)
+```
+
+### 3.3. Composite Schedule
+
+Each schedule primitive handles only a very basic operation to transform the IR, for example, `split` only splits a loop into two. In the real world, the over-fine granularity of those primitives usually leads to repetitive and verbose scheduling code, as [mentioned](https://discuss.tvm.apache.org/t/rfc-tensorir-a-schedulable-ir-for-tvm/7872/43?u=junrushao1994) by developers in our community.
+
+To counter this challenge, we allow users to register “composite schedules” that analyze the IR, and apply a set of schedule primitives correspondingly. For instance, a composite schedule may inspect a TensorIR block and decide whether we should call `compute_inline` on it. The composite schedule may use sampling instructions to fill in undecided choices.
+
+Our system also ships with some built-in composite schedules, including:
+
+* Multi-level tiling
+* Inline pure spatial blocks
+* Parallelize & vectorize & unroll
+* Auto tensorize
+* …
+
+### 3.4. AutoScheduler-style Design Space Generation
+
+AutoScheduler (Ansor) generates schedule templates by applying their SearchRules to each stage. Meta schedule treats a search rule as a composite schedule, and applies each composite schedule to each block of TensorIR to generate the design space.
+
+### 3.5. Unifying manual schedule / AutoTVM / Ansor
+
+In this section, we show that the design space induced by TE manual schedule, AutoTVM and Ansor are all subsets of meta schedule, and meta schedule further allows mixing those three styles to search jointly.
+
+**Manual schedule**. The TE schedule is a special case of a meta schedule program, where there is no randomness introduced by sampling instructions. It is a single point in terms of design space.
+
+**AutoTVM (Template-based tuning)**. Writing one or more schedule functions in meta schedule, potentially with sampling instructions, is a natural representation of AutoTVM’s schedule templates (knobs). The PPL generates one or more traces as the design space to explore.
+
+**AutoScheduler (Ansor, Template-free tuning)**. As mentioned in the previous section, application of composite schedule rules generates the design space, which is equivalent to Ansor’s sketch generation.
+
+**Mixing styles in design space definition**. By taking union of the spaces induced by the three special cases, our system allows developers to combine generic rules that Ansor provides and operator-specific scheduling.
+
+## 4. Reference-level explanation
+
+In this section, we introduce the underlying techniques for the automation system to extract and explore the design space.
+
+### 4.1. Execution trace as the design space
+
+**Trace**. To represent the design space defined by the meta schedule DSL, the underlying system records all the instructions users applied to the schedule class, including sampling and schedule primitives. We call this list of instructions a trace.
+
+Executing the example above results in the following trace:
+
+```
+Instruction 0. Sample-Perfect-Tile
+Instruction 1. Sample-Perfect-Tile
+Instruction 2. Sample-Perfect-Tile
+Instruction 3. Split
+Instruction 4. Split
+Instruction 5. Split
+Instruction 6. Reorder
+```
+
+The trace is not directly user-facing, but a data structure inside the user-facing `Schedule` class that records the execution. The automation system extracts the trace and finds out the design space according to the sampling instructions.
+
+**Union of traces**. Often a single trace is unable to represent the entire space. Therefore, more precisely, our system works on a list of traces as the union of potential design space.
+
+**Fork a trace**. When two different decisions in the scheduling process are equally important to generate high-performance schedules, we allow forking the trace into two, and the design space is the union of the forked traces.

Review comment:
       Do you mean we will go to two directions at the same time and generate the design space united two directions?

##########
File path: rfcs/0001-meta-schedule-autotensorir.md
##########
@@ -0,0 +1,297 @@
+* Feature Name: Meta Schedule (AutoTensorIR)
+* Start Date: 2021-05-28
+* RFC PR: TBD (apache/tvm-rfcs#0000)
+* GitHub Issue: TBD (apache/tvm-rfcs#0000)
+
+## 1. Summary
+
+This proposal introduces Meta Schedule: a probabilistic scheduling DSL on TIR that unifies the approaches of AutoTVM and Auto Scheduler (Ansor). Meta schedule provides a pragmatic way to define the space of automatic tuning, extensibility in terms of all possible TIR schedule primitives like tensorization and loop partitioning, and customizability on every layer of the automation system.
+
+Meta Schedule is our 3rd generation automatic scheduling system.
+
+## 2. Motivation
+
+**Scheduling and Design Space**
+
+In TVM TensorIR, optimization of a TensorIR program is done via a sequence of transformations. For example, we reorder loops for better locality and we tensorize for specific hardware intrinsics. The process of invoking such a set of pre-defined transformations is called “**scheduling**”, and each transformation is called a “**schedule primitive**”. These primitives form a domain-specific language (DSL) describing the transformation of TensorIR programs. **Design space** is the set of all possible schedulings with respect to a TensorIR program.
+
+**Problems with the Current Scheduling System**
+
+* **Manual schedule**: Developers optimize their programs by manually invoking schedule primitives, i.e. explore points in the design space with humans in the loop. This can be a tedious and error-prone approach, hence the creation of AutoTVM and AutoScheduler (Ansor).
+* **AutoTVM**: The automation system requires users to define “schedule templates” as the design space for each operator. Therefore, it is inextensible to hundreds of operators.
+* **AutoScheduler (Ansor)**: It automatically generates schedule templates as the design space, according to a set of predefined “search rules”. However, it is non-trivial to extend AutoScheduler to new schedule primitives (tensorize, loop partition, software pipelining).
+* The three systems above have isolated sets of APIs with several layers of their own abstraction, which are not only hard to learn, but also engineering-intensive to customize.
+
+**Benefit of Meta Schedule**
+
+* Succinct syntax, consistent APIs to TensorIR schedule with no other layer of abstraction.
+* Provides unified APIs for implementing manual schedule, AutoTVM and AutoScheduler (Ansor).
+* Extensibility to all the schedule primitives, including tensorization and loop partitioning. Almost no extra effort is needed to use a new primitive in auto-tuning.
+* The automation infrastructure is customizable across every layer.
+
+## 3. Guide-level explanation
+
+In this section, we describe the syntax of meta schedule DSL, and how it could be used to describe and auto-generate the design space.
+
+### 3.1. Manual Schedule
+
+Meta schedule APIs are almost the same as TE or TensorIR scheduling. Here is an example of a manual schedule for matrix multiplication:
+
+```python
+# Designate a set of tile sizes
+i_tiles = [16, 8, 8, 8]
+j_tiles = [16, 8, 8, 8]
+k_tiles = [256, 8]
+
+# Tile the loops according to the tile sizes
+i_0, i_1, i_2, i_3 = sch.split(loop=i, factors=i_tiles)
+j_0, j_1, j_2, j_3 = sch.split(loop=j, factors=j_tiles)
+k_0, k_1           = sch.split(loop=k, factors=k_tiles)
+
+# Organize the loops into “SSRSRS” 6-level tiles
+sch.reorder(
+    i_0, j_0, # S
+    i_1, j_1, # S
+    k_0,      # R
+    i_2, j_2, # S
+    k_1,      # R
+    i_3, j_3, # S
+)
+```
+
+In this example, the developers may tweak the tile sizes and measure the performance of the generated kernels to explore the opportunities of potential optimization.
+
+Generally speaking, while writing a schedule, there are often some parameters that are hard to determine ahead of time, for example, tile sizes, unroll steps, or which tensor intrinsics to use. Developers may manually enumerate possible combinations of these unknown factors, and then pick the best schedule according to measurement results on their device.
+
+### 3.2. AutoTVM-style Design Space Description
+
+Meta schedule extends the schedule DSL with sampling instructions. When included in a schedule, these instructions parametrize the schedule from a single deterministic point to a space supported by random variables (tile size, etc.), making it possible for developers to describe the design space with meta schedule APIs.
+
+We can extend the matmul example above to cover all possible tilings using these sampling instructions:
+
+```python
+# Sample tile sizes
+i_tiles = sch.sample_perfect_tile(i, n=4)
+j_tiles = sch.sample_perfect_tile(j, n=4)
+k_tiles = sch.sample_perfect_tile(k, n=2)
+# Tile the loops according to the random variables
+i_0, i_1, i_2, i_3 = sch.split(loop=i, factors=i_tiles)
+j_0, j_1, j_2, j_3 = sch.split(loop=j, factors=j_tiles)
+k_0, k_1           = sch.split(loop=k, factors=k_tiles)
+# Organize the loops into “SSRSRS” 6-level tiles
+sch.reorder(
+    i_0, j_0, # S
+    i_1, j_1, # S
+    k_0,      # R
+    i_2, j_2, # S
+    k_1,      # R
+    i_3, j_3, # S
+)
+```
+
+### 3.3. Composite Schedule
+
+Each schedule primitive handles only a very basic operation to transform the IR, for example, `split` only splits a loop into two. In the real world, the over-fine granularity of those primitives usually leads to repetitive and verbose scheduling code, as [mentioned](https://discuss.tvm.apache.org/t/rfc-tensorir-a-schedulable-ir-for-tvm/7872/43?u=junrushao1994) by developers in our community.
+
+To counter this challenge, we allow users to register “composite schedules” that analyze the IR, and apply a set of schedule primitives correspondingly. For instance, a composite schedule may inspect a TensorIR block and decide whether we should call `compute_inline` on it. The composite schedule may use sampling instructions to fill in undecided choices.
+
+Our system also ships with some built-in composite schedules, including:
+
+* Multi-level tiling
+* Inline pure spatial blocks
+* Parallelize & vectorize & unroll
+* Auto tensorize
+* …
+
+### 3.4. AutoScheduler-style Design Space Generation
+
+AutoScheduler (Ansor) generates schedule templates by applying their SearchRules to each stage. Meta schedule treats a search rule as a composite schedule, and applies each composite schedule to each block of TensorIR to generate the design space.
+
+### 3.5. Unifying manual schedule / AutoTVM / Ansor
+
+In this section, we show that the design space induced by TE manual schedule, AutoTVM and Ansor are all subsets of meta schedule, and meta schedule further allows mixing those three styles to search jointly.
+
+**Manual schedule**. The TE schedule is a special case of a meta schedule program, where there is no randomness introduced by sampling instructions. It is a single point in terms of design space.
+
+**AutoTVM (Template-based tuning)**. Writing one or more schedule functions in meta schedule, potentially with sampling instructions, is a natural representation of AutoTVM’s schedule templates (knobs). The PPL generates one or more traces as the design space to explore.

Review comment:
       What is the meaning of `PPL`?

##########
File path: rfcs/0001-meta-schedule-autotensorir.md
##########
@@ -0,0 +1,297 @@
+* Feature Name: Meta Schedule (AutoTensorIR)
+* Start Date: 2021-05-28
+* RFC PR: TBD (apache/tvm-rfcs#0000)
+* GitHub Issue: TBD (apache/tvm-rfcs#0000)
+
+## 1. Summary
+
+This proposal introduces Meta Schedule: a probabilistic scheduling DSL on TIR that unifies the approaches of AutoTVM and Auto Scheduler (Ansor). Meta schedule provides a pragmatic way to define the space of automatic tuning, extensibility in terms of all possible TIR schedule primitives like tensorization and loop partitioning, and customizability on every layer of the automation system.
+
+Meta Schedule is our 3rd generation automatic scheduling system.
+
+## 2. Motivation
+
+**Scheduling and Design Space**
+
+In TVM TensorIR, optimization of a TensorIR program is done via a sequence of transformations. For example, we reorder loops for better locality and we tensorize for specific hardware intrinsics. The process of invoking such a set of pre-defined transformations is called “**scheduling**”, and each transformation is called a “**schedule primitive**”. These primitives form a domain-specific language (DSL) describing the transformation of TensorIR programs. **Design space** is the set of all possible schedulings with respect to a TensorIR program.
+
+**Problems with the Current Scheduling System**
+
+* **Manual schedule**: Developers optimize their programs by manually invoking schedule primitives, i.e. explore points in the design space with humans in the loop. This can be a tedious and error-prone approach, hence the creation of AutoTVM and AutoScheduler (Ansor).
+* **AutoTVM**: The automation system requires users to define “schedule templates” as the design space for each operator. Therefore, it is inextensible to hundreds of operators.
+* **AutoScheduler (Ansor)**: It automatically generates schedule templates as the design space, according to a set of predefined “search rules”. However, it is non-trivial to extend AutoScheduler to new schedule primitives (tensorize, loop partition, software pipelining).
+* The three systems above have isolated sets of APIs with several layers of their own abstraction, which are not only hard to learn, but also engineering-intensive to customize.
+
+**Benefit of Meta Schedule**
+
+* Succinct syntax, consistent APIs to TensorIR schedule with no other layer of abstraction.
+* Provides unified APIs for implementing manual schedule, AutoTVM and AutoScheduler (Ansor).
+* Extensibility to all the schedule primitives, including tensorization and loop partitioning. Almost no extra effort is needed to use a new primitive in auto-tuning.
+* The automation infrastructure is customizable across every layer.
+
+## 3. Guide-level explanation
+
+In this section, we describe the syntax of meta schedule DSL, and how it could be used to describe and auto-generate the design space.
+
+### 3.1. Manual Schedule
+
+Meta schedule APIs are almost the same as TE or TensorIR scheduling. Here is an example of a manual schedule for matrix multiplication:
+
+```python
+# Designate a set of tile sizes
+i_tiles = [16, 8, 8, 8]
+j_tiles = [16, 8, 8, 8]
+k_tiles = [256, 8]
+
+# Tile the loops according to the tile sizes
+i_0, i_1, i_2, i_3 = sch.split(loop=i, factors=i_tiles)
+j_0, j_1, j_2, j_3 = sch.split(loop=j, factors=j_tiles)
+k_0, k_1           = sch.split(loop=k, factors=k_tiles)
+
+# Organize the loops into “SSRSRS” 6-level tiles
+sch.reorder(
+    i_0, j_0, # S
+    i_1, j_1, # S
+    k_0,      # R
+    i_2, j_2, # S
+    k_1,      # R
+    i_3, j_3, # S
+)
+```
+
+In this example, the developers may tweak the tile sizes and measure the performance of the generated kernels to explore the opportunities of potential optimization.
+
+Generally speaking, while writing a schedule, there are often some parameters that are hard to determine ahead of time, for example, tile sizes, unroll steps, or which tensor intrinsics to use. Developers may manually enumerate possible combinations of these unknown factors, and then pick the best schedule according to measurement results on their device.
+
+### 3.2. AutoTVM-style Design Space Description
+
+Meta schedule extends the schedule DSL with sampling instructions. When included in a schedule, these instructions parametrize the schedule from a single deterministic point to a space supported by random variables (tile size, etc.), making it possible for developers to describe the design space with meta schedule APIs.
+
+We can extend the matmul example above to cover all possible tilings using these sampling instructions:
+
+```python
+# Sample tile sizes
+i_tiles = sch.sample_perfect_tile(i, n=4)

Review comment:
       What is the meaning of `perfect` here?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tvm-rfcs] junrushao1994 commented on a change in pull request #5: [RFC] Meta Schedule (AutoTIR)

Posted by GitBox <gi...@apache.org>.
junrushao1994 commented on a change in pull request #5:
URL: https://github.com/apache/tvm-rfcs/pull/5#discussion_r664921748



##########
File path: rfcs/0001-meta-schedule-autotensorir.md
##########
@@ -0,0 +1,339 @@
+<!--- Licensed to the Apache Software Foundation (ASF) under one -->
+<!--- or more contributor license agreements.  See the NOTICE file -->
+<!--- distributed with this work for additional information -->
+<!--- regarding copyright ownership.  The ASF licenses this file -->
+<!--- to you under the Apache License, Version 2.0 (the -->
+<!--- "License"); you may not use this file except in compliance -->
+<!--- with the License.  You may obtain a copy of the License at -->
+
+<!---   http://www.apache.org/licenses/LICENSE-2.0 -->
+
+<!--- Unless required by applicable law or agreed to in writing, -->
+<!--- software distributed under the License is distributed on an -->
+<!--- "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY -->
+<!--- KIND, either express or implied.  See the License for the -->
+<!--- specific language governing permissions and limitations -->
+<!--- under the License. -->
+
+* Feature Name: Meta Schedule (AutoTIR)
+* Start Date: 2021-05-28
+* RFC PR: TBD (apache/tvm-rfcs#0000)
+* GitHub Issue: TBD (apache/tvm-rfcs#0000)
+
+## 1. Summary
+
+This proposal introduces Meta Schedule: a probabilistic scheduling DSL on TIR that unifies the approaches of AutoTVM and Auto Scheduler (Ansor). Meta schedule provides a pragmatic way to define the space of automatic tuning, extensibility in terms of all possible TIR schedule primitives like tensorization and loop partitioning, and customizability on every layer of the automation system.
+
+Meta Schedule is our 3rd generation automatic scheduling system.
+
+## 2. Motivation
+
+**Scheduling and Design Space**
+
+In TVM TensorIR, optimization of a TensorIR program is done via a sequence of transformations. For example, we reorder loops for better locality and we tensorize for specific hardware intrinsics. The process of invoking such a set of pre-defined transformations is called "**scheduling**", and each transformation is called a "**schedule primitive**". These primitives form a domain-specific language (DSL) describing the transformation of TensorIR programs. **Design space** is the set of all possible schedulings with respect to a TensorIR program.
+
+**Problems with the Current Scheduling System**
+
+* **Manual schedule**: Developers optimize their programs by manually invoking schedule primitives, i.e. explore points in the design space with humans in the loop. This can be a tedious and error-prone approach, hence the creation of AutoTVM and AutoScheduler (Ansor).
+* **AutoTVM**: The automation system requires users to define "schedule templates" as the design space for each operator. Therefore, it is inextensible to hundreds of operators and variety hardware platforms.
+* **AutoScheduler (Ansor)**: It automatically generates schedule templates as the design space, according to a set of predefined "search rules". However, it is non-trivial to extend AutoScheduler to new schedule primitives (tensorize, loop partition, software pipelining, etc).
+* The three systems above have isolated sets of APIs with several layers of their own abstraction, which are not only hard to learn, but also engineering-intensive to customize.
+
+**Benefit of Meta Schedule**
+
+* Succinct syntax, consistent APIs to TensorIR schedule with no other layer of abstraction.
+* Provides unified APIs for implementing manual schedule, AutoTVM and AutoScheduler (Ansor).
+* Extensibility to all the schedule primitives, including tensorization and loop partitioning. Almost no extra effort is needed to use a new primitive in auto-tuning.
+* The automation infrastructure is extensible on every of its components. Every component of the system can be customized easily in pure python or C++ or both. For example, one can develop a new design space generator in python, a new ProgramRunner in python, etc.
+
+
+## 3. Guide-level explanation
+
+In this section, we describe the syntax of meta schedule DSL, and how it could be used to describe and auto-generate the design space.
+
+### 3.1. Manual Schedule
+
+Meta schedule APIs are almost the same as TE or TensorIR scheduling. Here is an example of a manual schedule for matrix multiplication:
+
+```python
+# Designate a set of tile sizes
+i_tiles = [16, 8, 8, 8]
+j_tiles = [16, 8, 8, 8]
+k_tiles = [256, 8]
+
+# Tile the loops according to the tile sizes
+i_0, i_1, i_2, i_3 = sch.split(loop=i, factors=i_tiles)
+j_0, j_1, j_2, j_3 = sch.split(loop=j, factors=j_tiles)
+k_0, k_1           = sch.split(loop=k, factors=k_tiles)
+
+# Organize the loops into "SSRSRS" 6-level tiles
+sch.reorder(
+    i_0, j_0, # S: the 1st spatial tile
+    i_1, j_1, # S: the 2nd spatial tile
+    k_0,      # R: the 1st reduction tile
+    i_2, j_2, # S: the 3rd spatial tile
+    k_1,      # R: the 2nd reduction tile
+    i_3, j_3, # S: the 4th spatial tile
+)
+```
+
+In this example, the developers may tweak the tile sizes and measure the performance of the generated kernels to explore the opportunities of potential optimization.
+
+Generally speaking, while writing a schedule, there are often some parameters that are hard to determine ahead of time, for example, tile sizes, unroll steps, or which tensor intrinsics to use. Developers may manually enumerate possible combinations of these unknown factors, and then pick the best schedule according to measurement results on their device.
+
+### 3.2. Composite Schedule Rules
+
+As introduced in the previous section, in TensorIR, each schedule primitive handles only a very basic transformation of the IR, for example, `split` only splits a loop into two. In the real world, the over-fine granularity of those primitives usually leads to repetitive and verbose scheduling code, as [mentioned](https://discuss.tvm.apache.org/t/rfc-tensorir-a-schedulable-ir-for-tvm/7872/43?u=junrushao1994) by developers in our community.
+
+To make it more convenient and modular, we allow users to register "composite schedules" that apply a sequence of schedule primitives according to certain analysis of the IR. For instance, a composite schedule may inspect a TensorIR block and decide whether we should call `compute_inline` on it.
+
+### 3.3. AutoTVM-style Design Space Description
+
+Meta schedule extends the schedule DSL with sampling instructions. When included in a schedule, these instructions parametrize the schedule from a single deterministic point to a space supported by random variables (tile size, etc.), making it possible for developers to describe the design space with meta schedule APIs.
+
+We can extend the matmul example above to cover all possible tilings using these sampling instructions:
+
+```python
+# Sample tile sizes
+i_tiles = sch.sample_perfect_tile(i, n=4)
+j_tiles = sch.sample_perfect_tile(j, n=4)
+k_tiles = sch.sample_perfect_tile(k, n=2)
+# Tile the loops according to the random variables
+i_0, i_1, i_2, i_3 = sch.split(loop=i, factors=i_tiles)
+j_0, j_1, j_2, j_3 = sch.split(loop=j, factors=j_tiles)
+k_0, k_1           = sch.split(loop=k, factors=k_tiles)
+# Organize the loops into "SSRSRS" 6-level tiles
+sch.reorder(
+    i_0, j_0, # S: the 1st spatial tile
+    i_1, j_1, # S: the 2nd spatial tile
+    k_0,      # R: the 1st reduction tile
+    i_2, j_2, # S: the 3rd spatial tile
+    k_1,      # R: the 2nd reduction tile
+    i_3, j_3, # S: the 4th spatial tile
+)
+```
+
+### 3.4. AutoScheduler-style Design Space Generation
+
+AutoScheduler (Ansor) generates schedule templates by applying their SearchRules to each stage. SearchRule analyzes TE and eagerly trigger schedule primitives accordingly in its internally maintained mini IR.
+
+As introduced in Section 3.2, composite schedule rules are equivalent to AutoScheduler's SearchRule in TensorIR scheduling. To further generate a design space for scheduling, sampling instructions are used in composite schedule rules. Similarly, the sketch generation phase in AutoScheduler is equivalent to applying composite schedule rules to each block in TensorIR.
+
+Several built-in composite schedule rules are shipped with our system to align with Ansor's design space:
+
+* Multi-level tiling
+* Inline pure spatial blocks
+* Parallelize & vectorize & unroll
+* Auto tensorize
+* ...
+
+Developers may implement their own rules in either Python or C++, and specify which rules to use with the following syntax:
+
+```python
+from tvm import meta_schedule as ms
+
+design_space_generator = ms.PostOrderApply(rules=[
+    ms.MultiLevelTiling(...),
+    CustomRule(...),
+    ms.OtherBuiltinRules(...),
+])
+
+```
+
+### 3.5. Unifying manual schedule / AutoTVM / Ansor
+
+In this section, we show that the design space induced by TE manual schedule, AutoTVM and Ansor are all subsets of meta schedule, and meta schedule further allows mixing those three styles to search jointly.
+
+**Manual schedule**. The TE schedule is a special case of a meta schedule program, where there is no randomness introduced by sampling instructions. It is a single point in terms of design space.
+
+**AutoTVM (Template-based tuning)**. It is more natural representation of AutoTVM’s schedule templates (knobs) by writing one or more schedule functions in meta schedule with sampling instructions. The execution traces generated by the schedule functions are the design space to be explored.
+
+**AutoScheduler (Ansor, Template-free tuning)**. As mentioned in the previous section, application of composite schedule rules generates the design space, which is equivalent to Ansor’s sketch generation.
+
+**Mixing styles in design space definition**. By taking union of the spaces induced by the three special cases, our system allows developers to combine generic rules that Ansor provides and operator-specific scheduling.
+
+## 4. Reference-level explanation
+
+In this section, we introduce the underlying techniques for the automation system to extract and explore the design space. The figure below briefly illustrates the workflow of the system:
+
+![meta-schedule-workflow](../resources/meta-schedule-workflow.png)
+
+### 4.1. Execution trace as the design space
+
+**Trace**. To represent the design space defined by the meta schedule DSL, the underlying system records all the instructions users applied to the schedule class, including sampling and schedule primitives. We call this list of instructions a trace.
+
+Executing the example above results in the following trace:
+
+```
+Instruction 0. Sample-Perfect-Tile
+Instruction 1. Sample-Perfect-Tile
+Instruction 2. Sample-Perfect-Tile
+Instruction 3. Split
+Instruction 4. Split
+Instruction 5. Split
+Instruction 6. Reorder
+```
+
+**Trace forms design space.** A trace may contain zero or more sampling instructions, which introduces the uncertainty in scheduling - one instance of sampling results in one point in the design space. Therefore, the trace itself forms a design space to explore, e.g. which set of tile sizes works best on a specific hardware.
+
+**Union of design space**. Our system works on a set of traces, representing the union of the design spaces represented by every single trace.
+
+**Fork a trace**. When two different decisions in the scheduling process are equally important to generate high-performance schedules, we allow forking the trace into two, and the design space is the union of the forked traces.
+
+The trace is not strictly user-facing, but can be accessed and printed with the following syntax:
+
+```python
+# requires to trace the execution
+sch = tir.Schedule(..., traced=True)
+# do a lot of scheduling
+...
+# print the trace
+print(sch.trace)
+```
+
+And below is an example of the printed trace, which honestly reflects the schedule as a snippet of python scheduling function:
+
+```python
+b0 = sch.get_block(name="matmul", func_name="main")
+l1, l2, l3 = sch.get_loops(block=b0)
+v4, v5, v6, v7 = sch.sample_perfect_tile(loop=l1, n=4, max_innermost_factor=16, decision=[32, 1, 16, 2])
+v8, v9, v10, v11 = sch.sample_perfect_tile(loop=l2, n=4, max_innermost_factor=16, decision=[64, 4, 2, 2])
+v12, v13 = sch.sample_perfect_tile(loop=l3, n=2, max_innermost_factor=16, decision=[64, 16])
+l14, l15, l16, l17 = sch.split(loop=l1, factors=[v4, v5, v6, v7])
+l18, l19, l20, l21 = sch.split(loop=l2, factors=[v8, v9, v10, v11])
+l22, l23 = sch.split(loop=l3, factors=[v12, v13])
+sch.reorder(l14, l18, l15, l19, l22, l16, l20, l23, l17, l21)
+```
+
+### 4.2. Exploring the Search Space
+
+Meta Schedule provides several built-in exploration strategies to exhaustively or efficiently search for efficient schedules.
+
+**Program replay**. A simple strategy that replays the user-provided schedule function without using or taking any advantage of trace.

Review comment:
       Oh i should say those random decisions are mutated when replaying the traces.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tvm-rfcs] junrushao1994 commented on a change in pull request #5: [RFC] Meta Schedule (AutoTIR)

Posted by GitBox <gi...@apache.org>.
junrushao1994 commented on a change in pull request #5:
URL: https://github.com/apache/tvm-rfcs/pull/5#discussion_r668494320



##########
File path: rfcs/0001-meta-schedule-autotensorir.md
##########
@@ -0,0 +1,444 @@
+<!--- Licensed to the Apache Software Foundation (ASF) under one -->
+<!--- or more contributor license agreements.  See the NOTICE file -->
+<!--- distributed with this work for additional information -->
+<!--- regarding copyright ownership.  The ASF licenses this file -->
+<!--- to you under the Apache License, Version 2.0 (the -->
+<!--- "License"); you may not use this file except in compliance -->
+<!--- with the License.  You may obtain a copy of the License at -->
+
+<!---   http://www.apache.org/licenses/LICENSE-2.0 -->
+
+<!--- Unless required by applicable law or agreed to in writing, -->
+<!--- software distributed under the License is distributed on an -->
+<!--- "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY -->
+<!--- KIND, either express or implied.  See the License for the -->
+<!--- specific language governing permissions and limitations -->
+<!--- under the License. -->
+
+* Feature Name: Meta Schedule (Formerly AutoTIR)
+* Start Date: 2021-05-28
+* RFC PR: TBD (apache/tvm-rfcs#0000)
+* GitHub Issue: TBD (apache/tvm-rfcs#0000)
+
+## 1. Summary
+
+This proposal introduces Meta Schedule: a probabilistic scheduling DSL on TIR that unifies the
+approaches of AutoTVM and Auto Scheduler (Ansor). Meta schedule provides a pragmatic way to define
+the space of automatic tuning, extensibility in terms of all possible TIR schedule primitives like
+tensorization and loop partitioning, and customizability on every layer of the automation system.
+
+Meta Schedule is our 3rd generation automatic scheduling system.
+
+## 2. Motivation
+
+**Scheduling and design space.** In TVM TensorIR, optimization of a TensorIR program is done via a
+sequence of transformations. For example, we reorder loops for better locality and we tensorize for
+specific hardware intrinsics. The process of invoking such a set of pre-defined transformations is
+called "**scheduling**", and each transformation is called a "**schedule primitive**". These
+primitives form a domain-specific language (DSL) describing the transformation of TensorIR programs.
+**Design space** is the set of all possible schedulings with respect to a TensorIR program.
+
+**Problems with the current scheduling system.** Currently we have 3 sets of scheduling APIs:
+* **Manual schedule**: Developers optimize their programs by manually invoking schedule primitives,
+  i.e. explore points in the design space with humans in the loop. This can be a tedious and
+  error-prone approach, hence the creation of AutoTVM and AutoScheduler (Ansor).
+* **AutoTVM**: The automation system requires users to define "schedule templates" as the design
+  space for each operator. Therefore, it is inextensible to hundreds of operators and variety
+  hardware platforms.
+* **AutoScheduler (Ansor)**: It automatically generates schedule templates as the design space,
+  according to a set of predefined "search rules". However, it is non-trivial to extend
+  AutoScheduler to new schedule primitives (tensorize, loop partition, software pipelining, etc).
+* The three systems above have isolated sets of APIs with several layers of their own abstraction,
+  which are not only hard to learn, but also engineering-intensive to customize.
+
+**Benefits of Meta Schedule.**  Meta schedule provides:
+* Succinct syntax, consistent APIs to TensorIR schedule with no other layer of abstraction.
+* Unified APIs for implementing manual schedules, AutoTVM-style schedules, and AutoScheduler-style
+  schedules.
+* Extensibility of all the schedule primitives, including tensorization and loop partitioning.
+  Almost no extra effort is needed to use a new primitive in auto-tuning.
+* The automation infrastructure is extensible on every of its components. Every component of the
+  system can be customized easily in pure python or C++ or both. For example, one can develop a new
+  design space generator in python, a new ProgramRunner in python, etc.
+
+
+## 3. Guide-level explanation
+
+In this section, we describe the syntax of meta schedule DSL, and how it could be used to describe
+and auto-generate the design space.
+
+### 3.1. Manual Schedule
+
+Meta schedule APIs are almost the same as TE or TensorIR scheduling. Here is an example of a manual
+schedule for matrix multiplication:
+
+```python
+# Designate a set of tile sizes
+i_tiles = [16, 8, 8, 8]
+j_tiles = [16, 8, 8, 8]
+k_tiles = [256, 8]
+
+# Tile the loops according to the tile sizes
+i_0, i_1, i_2, i_3 = sch.split(loop=i, factors=i_tiles)
+j_0, j_1, j_2, j_3 = sch.split(loop=j, factors=j_tiles)
+k_0, k_1           = sch.split(loop=k, factors=k_tiles)
+
+# Organize the loops into "SSRSRS" 6-level tiles
+sch.reorder(
+    i_0, j_0, # S: the 1st spatial tile
+    i_1, j_1, # S: the 2nd spatial tile
+    k_0,      # R: the 1st reduction tile
+    i_2, j_2, # S: the 3rd spatial tile
+    k_1,      # R: the 2nd reduction tile
+    i_3, j_3, # S: the 4th spatial tile
+)
+```
+
+In this example, the developers may tweak the tile sizes and measure the performance of the
+generated kernels to explore the opportunities of potential optimization.
+
+Generally speaking, while writing a schedule, there are often some parameters that are hard to
+determine ahead of time, for example, tile sizes, unroll steps, or which tensor intrinsics to use.
+Developers may manually enumerate possible combinations of these unknown factors, and then pick the
+best schedule according to measurement results on their device.
+
+### 3.2. Composite Schedule Rules
+
+As introduced in the previous section, in TensorIR, each schedule primitive handles only a very
+basic transformation of the IR. For example, `split` only splits a loop into two new loops. In the
+real world, the over-fine granularity of those primitives usually leads to repetitive and verbose
+scheduling code, as
+[mentioned](https://discuss.tvm.apache.org/t/rfc-tensorir-a-schedulable-ir-for-tvm/7872/43?u=junrushao1994)
+by developers in our community.
+
+To make it more convenient and modular, we allow users to register "composite schedules" that apply
+a sequence of schedule primitives according to certain analysis of the IR. For instance, a composite
+schedule may inspect a TensorIR block and decide whether we should call `compute_inline` on it.
+
+### 3.3. AutoTVM-style Design Space Description
+
+Meta schedule extends the schedule DSL with sampling instructions. When included in a schedule,
+these instructions parametrize the schedule from a single deterministic point to a space supported
+by random variables (tile size, etc.), making it possible for developers to describe the design
+space with meta schedule APIs.
+
+We can extend the matmul example above to cover all possible tilings using these sampling
+instructions:
+
+```python
+# Sample tile sizes
+i_tiles = sch.sample_perfect_tile(i, n=4)
+j_tiles = sch.sample_perfect_tile(j, n=4)
+k_tiles = sch.sample_perfect_tile(k, n=2)
+# Tile the loops according to the random variables
+i_0, i_1, i_2, i_3 = sch.split(loop=i, factors=i_tiles)
+j_0, j_1, j_2, j_3 = sch.split(loop=j, factors=j_tiles)
+k_0, k_1           = sch.split(loop=k, factors=k_tiles)
+# Organize the loops into "SSRSRS" 6-level tiles
+sch.reorder(
+    i_0, j_0, # S: the 1st spatial tile
+    i_1, j_1, # S: the 2nd spatial tile
+    k_0,      # R: the 1st reduction tile
+    i_2, j_2, # S: the 3rd spatial tile
+    k_1,      # R: the 2nd reduction tile
+    i_3, j_3, # S: the 4th spatial tile
+)
+```
+
+### 3.4. AutoScheduler-style Design Space Generation
+
+AutoScheduler (Ansor) generates schedule templates by applying their SearchRules to each stage.
+SearchRule analyzes TE and eagerly trigger schedule primitives accordingly in its internally
+maintained mini IR.
+
+As introduced in Section 3.2, composite schedule rules are equivalent to AutoScheduler's SearchRule
+in TensorIR scheduling. To further generate a design space for scheduling, sampling instructions are
+used in composite schedule rules. Similarly, the sketch generation phase in AutoScheduler is
+equivalent to applying composite schedule rules to each block in TensorIR.
+
+Several built-in composite schedule rules are shipped with our system to align with Ansor's design
+space:
+
+* Multi-level tiling
+* Inline pure spatial blocks
+* Parallelize & vectorize & unroll
+* Auto tensorize
+
+Developers may implement their own rules in either Python or C++. They may specify which rules to
+use with the following syntax:
+
+```python
+from tvm import meta_schedule as ms
+
+design_space_generator = ms.PostOrderApply(rules=[
+    ms.MultiLevelTiling(...),
+    CustomRule(...),
+    ms.OtherBuiltinRules(...),
+])
+
+```
+
+### 3.5. Unifying manual schedule / AutoTVM / Ansor
+
+In this section, we show that the design space induced by TE manual schedule, AutoTVM and Ansor are
+all subsets of meta schedule, and meta schedule further allows mixing those three styles to search
+jointly.
+
+**Manual schedule**. The TE schedule is a special case of a meta schedule program, where there is no
+randomness introduced by sampling instructions. It is a single point in terms of design space.
+
+**AutoTVM (Template-based tuning)**. It is more natural representation of AutoTVM’s schedule
+templates (knobs) by writing one or more schedule functions in meta schedule with sampling
+instructions. The execution traces generated by the schedule functions are the design space to be
+explored.
+
+**AutoScheduler (Ansor, Template-free tuning)**. As mentioned in the previous section, application
+of composite schedule rules generates the design space, which is equivalent to Ansor’s sketch
+generation.
+
+**Mixing styles in design space definition**. By taking union of the spaces induced by the three
+special cases, our system allows developers to combine generic rules that Ansor provides and
+operator-specific scheduling.
+
+## 4. Reference-level explanation
+
+In this section, we introduce the underlying techniques for the automation system to extract and
+explore the design space. The figure below briefly illustrates the workflow of the system:
+
+![meta-schedule-workflow](../resources/meta-schedule-workflow.png)
+
+### 4.1. Execution trace as the design space
+
+**Trace**. To represent the design space defined by the meta schedule DSL, the underlying system
+records all the instructions users applied to the schedule class, including sampling and schedule
+primitives. We call this list of instructions a trace.
+
+For instance, executing the example above results in the following trace:
+
+```
+Instruction 0. Sample-Perfect-Tile
+Instruction 1. Sample-Perfect-Tile
+Instruction 2. Sample-Perfect-Tile
+Instruction 3. Split
+Instruction 4. Split
+Instruction 5. Split
+Instruction 6. Reorder
+```
+
+**Trace forms design space.** A trace may contain zero or more sampling instructions, which
+introduces the uncertainty in scheduling - one instance of sampling results in one point in the
+design space. Therefore, the trace itself forms a design space to explore, e.g. which set of tile
+sizes works best on a specific hardware.
+
+**Union of design space**. Our system works on a set of traces, representing the union of the design
+spaces represented by every single trace.
+
+**Fork a trace**. When two different decisions in the scheduling process are equally important to
+generate high-performance schedules, we allow forking the trace into two, and the design space is
+the union of the forked traces.
+
+The trace is not strictly user-facing, but can be accessed and printed with the following syntax:
+
+```python
+# requires to trace the execution
+sch = tir.Schedule(..., traced=True)
+# do a lot of scheduling
+...
+# print the trace
+print(sch.trace)
+```
+
+And below is an example of the printed trace, which honestly reflects the schedule as a snippet of
+python scheduling function:
+
+```python
+b0 = sch.get_block(name="matmul", func_name="main")
+l1, l2, l3 = sch.get_loops(block=b0)
+v4, v5, v6, v7 = sch.sample_perfect_tile(loop=l1, n=4, max_innermost_factor=16, decision=[32, 1, 16, 2])
+v8, v9, v10, v11 = sch.sample_perfect_tile(loop=l2, n=4, max_innermost_factor=16, decision=[64, 4, 2, 2])
+v12, v13 = sch.sample_perfect_tile(loop=l3, n=2, max_innermost_factor=16, decision=[64, 16])
+l14, l15, l16, l17 = sch.split(loop=l1, factors=[v4, v5, v6, v7])
+l18, l19, l20, l21 = sch.split(loop=l2, factors=[v8, v9, v10, v11])
+l22, l23 = sch.split(loop=l3, factors=[v12, v13])
+sch.reorder(l14, l18, l15, l19, l22, l16, l20, l23, l17, l21)
+```
+
+### 4.2. Exploring the Design Space
+
+Meta Schedule provides several built-in exploration strategies to exhaustively or efficiently search
+for efficient schedules.
+
+**Random search by replaying schedule functions.** With a user-provided schedule function
+as a black-box design space generator, our system could repetitively invoke such an opaque function
+without doing any extra analysis. The function could be written in C++ or Python, or any language
+that implements packed function FFI. If sampling instructions are present in the function, then each
+invocation results in a different IRModule after being scheduled because the random decisions are
+possibly changed across different runs. Effectively, it is equivalent to
+random exploration without trace, allowing the flexibility for users to define arbitrary functions
+that trace may not well support (e.g. control flow divergence based on the value of intermediate
+random variables), but it forbids future opportunity of any trace-based analysis.
+
+**Random search by replaying traces.** Traces are obtained from a design space generator, and
+replayed with a builtin interpreter in our system. If sampling instructions are present on the
+traces, then their random decisions are mutated during each replay, i.e. jumps to a new point in the
+design space. Therefore, repetitive replay of those traces are equivalent to exploration of the
+design space. Our system could potentially benefit from trace-based analysis, including rejecting
+obviously invalid schedules (e.g. using too much CUDA resources), doing dead-code elimination to
+simplify a trace, extracting trace-based features used in the cost model, etc.
+
+**Cost-model-guided evolutionary search**. A more efficient exploration strategy. We define two sets
+of rules:
+
+* Mutator: defines how to jump to a point’s "neighbor" in the design space
+* Postprocessor: after the trace is executed, there are some extra rules we want to execute, for
+  example:
+  * Check CUDA resource limits: There is a hard requirement in CUDA that the maximum number of
+    threads should not exceed 1024, but it is a random variable that cannot be determined before
+    actually executing the trace. In this case, we write a postprocessor that errors out when the
+    condition is not satisfied.
+  * Fuse outer loops until the extent of the fused loops is large enough: The number of outer loops
+    to be fused together depends on their extents, which are random variables. In this case, we
+    annotate the maximum extent allowed on the block, and do actual fusion in a postprocessor.
+
+Our evolutionary search algorithm uses mutators to find possible schedules in the design space, then
+applies postprocessors, and asks the cost model to predict its performance. After several
+iterations, the new schedules with the highest scores are finally compiled and measured on device.
+Epsilon-greedy is used in this process to balance exploitation and exploration.
+
+### 4.3. Python first for flexibility & customizability
+
+We implement the system in a way that all levels are decoupled and open to customization, aiming at
+providing a playground for developers to try out new ideas and potentially deliver performance
+quickly.
+
+While all the important APIs are implemented in C++ for efficiency, every part of the system can be
+easily switched to customized python implementation. For example,
+
+**Customize design space in python**. Can be a python function that does the schedule
+
+```python
+def schedule_matmul(sch) -> sch:
+    i, j, k = sch.get_loops(sch.get_block("matmul"))
+    i_tiles = sch.sample_perfect_tile(i, n=4)
+    j_tiles = sch.sample_perfect_tile(j, n=4)
+    k_tiles = sch.sample_perfect_tile(k, n=2)
+    # Tile the loops according to the random variables
+    i_0, i_1, i_2, i_3 = sch.split(loop=i, factors=i_tiles)
+    j_0, j_1, j_2, j_3 = sch.split(loop=j, factors=j_tiles)
+    k_0, k_1 = sch.split(loop=k, factors=k_tiles)
+    # Organize the loops into "SSRSRS" 6-level tiles
+    sch.reorder(
+        i_0, j_0, # S
+        i_1, j_1, # S
+        k_0,      # R
+        i_2, j_2, # S
+        k_1,      # R
+        i_3, j_3, # S
+    )
+    return sch
+```
+
+**Customize composite schedule in python**. We provide two ways to define a composite schedule in
+python:
+
+Method 1. Derive from `PyCompositeSchedule`, and implement two methods `initialize` and `apply`:
+
+```python
+class MultiLevelTiling(PyCompositeSchedule):
+    def initialize(...):
+        # initialize the class, usually this method is empty
+        ...
+
+    def apply(sch: Schedule, block: BlockRV) -> Union[Schedule, List[Schedule]]:
+        # write any python code, including:
+        # - analyze `block`
+        # - invoke schedule primitives in `sch`
+        # - do debug printing
+        ...
+```
+
+Method 2. A decorator as the syntactic sugar if the `initialize` method is empty, which converts the
+function to the `apply` method.
+
+```python
+@tir.as_composite_schedule(name="multi-level-tiling")
+def multi_level_tiling(sch: Schedule, block: BlockRV) -> Union[Schedule, List[Schedule]]:
+    # write any python code, including:
+    # - analyze `block`
+    # - invoke schedule primitives in `sch`
+    # - do debug printing
+    ...
+```
+
+**Customize exploration strategies in python**. Developers can implement any search algorithm in
+python as well by deriving from `PySearchPolicy`, and the syntax is identical to customizing with
+`PyCompositeSchedule`.
+
+**Other customizable components**. This list includes:
+
+* Cost model
+* Database
+* Measure callbacks
+* Feature extractor
+* Program builder & runner
+* Analysis methods
+* ...
+
+In a short summary, almost every component of the system is decoupled with each other and extensions
+could be easily plugged in.
+
+## 5. Drawbacks
+
+We are not aware of any drawbacks of the proposed system.
+
+## 6. Rationale and alternatives
+
+The system is designed with the principle of minimalism: different from alternative solutions, we do

Review comment:
       The paragraph is not well written. Just revised it:
   
   ```markdown
   The system is designed with the principle of minimalism: Assuming users already know TensorIR
   scheduling APIs, there is no more extra API set to learn, and the previous TensorIR scheduling
   programs will work out of box with meta schedule. It could potentially lower the bar of adopting
   this system.
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tvm-rfcs] junrushao1994 commented on a change in pull request #5: [RFC] Meta Schedule (AutoTIR)

Posted by GitBox <gi...@apache.org>.
junrushao1994 commented on a change in pull request #5:
URL: https://github.com/apache/tvm-rfcs/pull/5#discussion_r668491013



##########
File path: rfcs/0001-meta-schedule-autotensorir.md
##########
@@ -0,0 +1,444 @@
+<!--- Licensed to the Apache Software Foundation (ASF) under one -->
+<!--- or more contributor license agreements.  See the NOTICE file -->
+<!--- distributed with this work for additional information -->
+<!--- regarding copyright ownership.  The ASF licenses this file -->
+<!--- to you under the Apache License, Version 2.0 (the -->
+<!--- "License"); you may not use this file except in compliance -->
+<!--- with the License.  You may obtain a copy of the License at -->
+
+<!---   http://www.apache.org/licenses/LICENSE-2.0 -->
+
+<!--- Unless required by applicable law or agreed to in writing, -->
+<!--- software distributed under the License is distributed on an -->
+<!--- "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY -->
+<!--- KIND, either express or implied.  See the License for the -->
+<!--- specific language governing permissions and limitations -->
+<!--- under the License. -->
+
+* Feature Name: Meta Schedule (Formerly AutoTIR)
+* Start Date: 2021-05-28
+* RFC PR: TBD (apache/tvm-rfcs#0000)
+* GitHub Issue: TBD (apache/tvm-rfcs#0000)
+
+## 1. Summary
+
+This proposal introduces Meta Schedule: a probabilistic scheduling DSL on TIR that unifies the
+approaches of AutoTVM and Auto Scheduler (Ansor). Meta schedule provides a pragmatic way to define
+the space of automatic tuning, extensibility in terms of all possible TIR schedule primitives like
+tensorization and loop partitioning, and customizability on every layer of the automation system.
+
+Meta Schedule is our 3rd generation automatic scheduling system.
+
+## 2. Motivation
+
+**Scheduling and design space.** In TVM TensorIR, optimization of a TensorIR program is done via a
+sequence of transformations. For example, we reorder loops for better locality and we tensorize for
+specific hardware intrinsics. The process of invoking such a set of pre-defined transformations is
+called "**scheduling**", and each transformation is called a "**schedule primitive**". These
+primitives form a domain-specific language (DSL) describing the transformation of TensorIR programs.
+**Design space** is the set of all possible schedulings with respect to a TensorIR program.
+
+**Problems with the current scheduling system.** Currently we have 3 sets of scheduling APIs:
+* **Manual schedule**: Developers optimize their programs by manually invoking schedule primitives,
+  i.e. explore points in the design space with humans in the loop. This can be a tedious and
+  error-prone approach, hence the creation of AutoTVM and AutoScheduler (Ansor).
+* **AutoTVM**: The automation system requires users to define "schedule templates" as the design
+  space for each operator. Therefore, it is inextensible to hundreds of operators and variety
+  hardware platforms.
+* **AutoScheduler (Ansor)**: It automatically generates schedule templates as the design space,
+  according to a set of predefined "search rules". However, it is non-trivial to extend
+  AutoScheduler to new schedule primitives (tensorize, loop partition, software pipelining, etc).
+* The three systems above have isolated sets of APIs with several layers of their own abstraction,
+  which are not only hard to learn, but also engineering-intensive to customize.
+
+**Benefits of Meta Schedule.**  Meta schedule provides:
+* Succinct syntax, consistent APIs to TensorIR schedule with no other layer of abstraction.
+* Unified APIs for implementing manual schedules, AutoTVM-style schedules, and AutoScheduler-style
+  schedules.
+* Extensibility of all the schedule primitives, including tensorization and loop partitioning.
+  Almost no extra effort is needed to use a new primitive in auto-tuning.
+* The automation infrastructure is extensible on every of its components. Every component of the
+  system can be customized easily in pure python or C++ or both. For example, one can develop a new
+  design space generator in python, a new ProgramRunner in python, etc.
+
+
+## 3. Guide-level explanation
+
+In this section, we describe the syntax of meta schedule DSL, and how it could be used to describe
+and auto-generate the design space.
+
+### 3.1. Manual Schedule
+
+Meta schedule APIs are almost the same as TE or TensorIR scheduling. Here is an example of a manual
+schedule for matrix multiplication:
+
+```python
+# Designate a set of tile sizes
+i_tiles = [16, 8, 8, 8]
+j_tiles = [16, 8, 8, 8]
+k_tiles = [256, 8]
+
+# Tile the loops according to the tile sizes
+i_0, i_1, i_2, i_3 = sch.split(loop=i, factors=i_tiles)
+j_0, j_1, j_2, j_3 = sch.split(loop=j, factors=j_tiles)
+k_0, k_1           = sch.split(loop=k, factors=k_tiles)
+
+# Organize the loops into "SSRSRS" 6-level tiles
+sch.reorder(
+    i_0, j_0, # S: the 1st spatial tile
+    i_1, j_1, # S: the 2nd spatial tile
+    k_0,      # R: the 1st reduction tile
+    i_2, j_2, # S: the 3rd spatial tile
+    k_1,      # R: the 2nd reduction tile
+    i_3, j_3, # S: the 4th spatial tile
+)
+```
+
+In this example, the developers may tweak the tile sizes and measure the performance of the
+generated kernels to explore the opportunities of potential optimization.
+
+Generally speaking, while writing a schedule, there are often some parameters that are hard to
+determine ahead of time, for example, tile sizes, unroll steps, or which tensor intrinsics to use.
+Developers may manually enumerate possible combinations of these unknown factors, and then pick the
+best schedule according to measurement results on their device.
+
+### 3.2. Composite Schedule Rules
+
+As introduced in the previous section, in TensorIR, each schedule primitive handles only a very
+basic transformation of the IR. For example, `split` only splits a loop into two new loops. In the
+real world, the over-fine granularity of those primitives usually leads to repetitive and verbose
+scheduling code, as
+[mentioned](https://discuss.tvm.apache.org/t/rfc-tensorir-a-schedulable-ir-for-tvm/7872/43?u=junrushao1994)
+by developers in our community.
+
+To make it more convenient and modular, we allow users to register "composite schedules" that apply
+a sequence of schedule primitives according to certain analysis of the IR. For instance, a composite
+schedule may inspect a TensorIR block and decide whether we should call `compute_inline` on it.
+
+### 3.3. AutoTVM-style Design Space Description
+
+Meta schedule extends the schedule DSL with sampling instructions. When included in a schedule,
+these instructions parametrize the schedule from a single deterministic point to a space supported
+by random variables (tile size, etc.), making it possible for developers to describe the design
+space with meta schedule APIs.
+
+We can extend the matmul example above to cover all possible tilings using these sampling
+instructions:
+
+```python
+# Sample tile sizes
+i_tiles = sch.sample_perfect_tile(i, n=4)
+j_tiles = sch.sample_perfect_tile(j, n=4)
+k_tiles = sch.sample_perfect_tile(k, n=2)
+# Tile the loops according to the random variables
+i_0, i_1, i_2, i_3 = sch.split(loop=i, factors=i_tiles)
+j_0, j_1, j_2, j_3 = sch.split(loop=j, factors=j_tiles)
+k_0, k_1           = sch.split(loop=k, factors=k_tiles)
+# Organize the loops into "SSRSRS" 6-level tiles
+sch.reorder(
+    i_0, j_0, # S: the 1st spatial tile
+    i_1, j_1, # S: the 2nd spatial tile
+    k_0,      # R: the 1st reduction tile
+    i_2, j_2, # S: the 3rd spatial tile
+    k_1,      # R: the 2nd reduction tile
+    i_3, j_3, # S: the 4th spatial tile
+)
+```
+
+### 3.4. AutoScheduler-style Design Space Generation
+
+AutoScheduler (Ansor) generates schedule templates by applying their SearchRules to each stage.
+SearchRule analyzes TE and eagerly trigger schedule primitives accordingly in its internally
+maintained mini IR.
+
+As introduced in Section 3.2, composite schedule rules are equivalent to AutoScheduler's SearchRule
+in TensorIR scheduling. To further generate a design space for scheduling, sampling instructions are
+used in composite schedule rules. Similarly, the sketch generation phase in AutoScheduler is
+equivalent to applying composite schedule rules to each block in TensorIR.
+
+Several built-in composite schedule rules are shipped with our system to align with Ansor's design
+space:
+
+* Multi-level tiling
+* Inline pure spatial blocks
+* Parallelize & vectorize & unroll
+* Auto tensorize
+
+Developers may implement their own rules in either Python or C++. They may specify which rules to
+use with the following syntax:
+
+```python
+from tvm import meta_schedule as ms
+
+design_space_generator = ms.PostOrderApply(rules=[
+    ms.MultiLevelTiling(...),
+    CustomRule(...),
+    ms.OtherBuiltinRules(...),
+])
+
+```
+
+### 3.5. Unifying manual schedule / AutoTVM / Ansor
+
+In this section, we show that the design space induced by TE manual schedule, AutoTVM and Ansor are
+all subsets of meta schedule, and meta schedule further allows mixing those three styles to search
+jointly.
+
+**Manual schedule**. The TE schedule is a special case of a meta schedule program, where there is no
+randomness introduced by sampling instructions. It is a single point in terms of design space.
+
+**AutoTVM (Template-based tuning)**. It is more natural representation of AutoTVM’s schedule
+templates (knobs) by writing one or more schedule functions in meta schedule with sampling
+instructions. The execution traces generated by the schedule functions are the design space to be
+explored.
+
+**AutoScheduler (Ansor, Template-free tuning)**. As mentioned in the previous section, application
+of composite schedule rules generates the design space, which is equivalent to Ansor’s sketch
+generation.
+
+**Mixing styles in design space definition**. By taking union of the spaces induced by the three
+special cases, our system allows developers to combine generic rules that Ansor provides and
+operator-specific scheduling.
+
+## 4. Reference-level explanation
+
+In this section, we introduce the underlying techniques for the automation system to extract and
+explore the design space. The figure below briefly illustrates the workflow of the system:
+
+![meta-schedule-workflow](../resources/meta-schedule-workflow.png)
+
+### 4.1. Execution trace as the design space
+
+**Trace**. To represent the design space defined by the meta schedule DSL, the underlying system
+records all the instructions users applied to the schedule class, including sampling and schedule
+primitives. We call this list of instructions a trace.
+
+For instance, executing the example above results in the following trace:
+
+```
+Instruction 0. Sample-Perfect-Tile
+Instruction 1. Sample-Perfect-Tile
+Instruction 2. Sample-Perfect-Tile
+Instruction 3. Split
+Instruction 4. Split
+Instruction 5. Split
+Instruction 6. Reorder
+```
+
+**Trace forms design space.** A trace may contain zero or more sampling instructions, which
+introduces the uncertainty in scheduling - one instance of sampling results in one point in the
+design space. Therefore, the trace itself forms a design space to explore, e.g. which set of tile
+sizes works best on a specific hardware.
+
+**Union of design space**. Our system works on a set of traces, representing the union of the design
+spaces represented by every single trace.
+
+**Fork a trace**. When two different decisions in the scheduling process are equally important to
+generate high-performance schedules, we allow forking the trace into two, and the design space is
+the union of the forked traces.
+
+The trace is not strictly user-facing, but can be accessed and printed with the following syntax:
+
+```python
+# requires to trace the execution
+sch = tir.Schedule(..., traced=True)
+# do a lot of scheduling
+...
+# print the trace
+print(sch.trace)
+```
+
+And below is an example of the printed trace, which honestly reflects the schedule as a snippet of
+python scheduling function:
+
+```python
+b0 = sch.get_block(name="matmul", func_name="main")
+l1, l2, l3 = sch.get_loops(block=b0)
+v4, v5, v6, v7 = sch.sample_perfect_tile(loop=l1, n=4, max_innermost_factor=16, decision=[32, 1, 16, 2])
+v8, v9, v10, v11 = sch.sample_perfect_tile(loop=l2, n=4, max_innermost_factor=16, decision=[64, 4, 2, 2])
+v12, v13 = sch.sample_perfect_tile(loop=l3, n=2, max_innermost_factor=16, decision=[64, 16])
+l14, l15, l16, l17 = sch.split(loop=l1, factors=[v4, v5, v6, v7])
+l18, l19, l20, l21 = sch.split(loop=l2, factors=[v8, v9, v10, v11])
+l22, l23 = sch.split(loop=l3, factors=[v12, v13])
+sch.reorder(l14, l18, l15, l19, l22, l16, l20, l23, l17, l21)
+```
+
+### 4.2. Exploring the Design Space
+
+Meta Schedule provides several built-in exploration strategies to exhaustively or efficiently search
+for efficient schedules.
+
+**Random search by replaying schedule functions.** With a user-provided schedule function
+as a black-box design space generator, our system could repetitively invoke such an opaque function
+without doing any extra analysis. The function could be written in C++ or Python, or any language
+that implements packed function FFI. If sampling instructions are present in the function, then each
+invocation results in a different IRModule after being scheduled because the random decisions are
+possibly changed across different runs. Effectively, it is equivalent to
+random exploration without trace, allowing the flexibility for users to define arbitrary functions
+that trace may not well support (e.g. control flow divergence based on the value of intermediate
+random variables), but it forbids future opportunity of any trace-based analysis.
+
+**Random search by replaying traces.** Traces are obtained from a design space generator, and
+replayed with a builtin interpreter in our system. If sampling instructions are present on the
+traces, then their random decisions are mutated during each replay, i.e. jumps to a new point in the
+design space. Therefore, repetitive replay of those traces are equivalent to exploration of the
+design space. Our system could potentially benefit from trace-based analysis, including rejecting
+obviously invalid schedules (e.g. using too much CUDA resources), doing dead-code elimination to
+simplify a trace, extracting trace-based features used in the cost model, etc.
+
+**Cost-model-guided evolutionary search**. A more efficient exploration strategy. We define two sets

Review comment:
       Just removed all the occurrence of "we" in the RFC. Thanks for the reminder!




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tvm-rfcs] junrushao1994 commented on a change in pull request #5: [RFC] Meta Schedule (AutoTIR)

Posted by GitBox <gi...@apache.org>.
junrushao1994 commented on a change in pull request #5:
URL: https://github.com/apache/tvm-rfcs/pull/5#discussion_r661793079



##########
File path: rfcs/0001-meta-schedule-autotensorir.md
##########
@@ -0,0 +1,297 @@
+* Feature Name: Meta Schedule (AutoTensorIR)
+* Start Date: 2021-05-28
+* RFC PR: TBD (apache/tvm-rfcs#0000)
+* GitHub Issue: TBD (apache/tvm-rfcs#0000)
+
+## 1. Summary
+
+This proposal introduces Meta Schedule: a probabilistic scheduling DSL on TIR that unifies the approaches of AutoTVM and Auto Scheduler (Ansor). Meta schedule provides a pragmatic way to define the space of automatic tuning, extensibility in terms of all possible TIR schedule primitives like tensorization and loop partitioning, and customizability on every layer of the automation system.
+
+Meta Schedule is our 3rd generation automatic scheduling system.
+
+## 2. Motivation
+
+**Scheduling and Design Space**
+
+In TVM TensorIR, optimization of a TensorIR program is done via a sequence of transformations. For example, we reorder loops for better locality and we tensorize for specific hardware intrinsics. The process of invoking such a set of pre-defined transformations is called “**scheduling**”, and each transformation is called a “**schedule primitive**”. These primitives form a domain-specific language (DSL) describing the transformation of TensorIR programs. **Design space** is the set of all possible schedulings with respect to a TensorIR program.
+
+**Problems with the Current Scheduling System**
+
+* **Manual schedule**: Developers optimize their programs by manually invoking schedule primitives, i.e. explore points in the design space with humans in the loop. This can be a tedious and error-prone approach, hence the creation of AutoTVM and AutoScheduler (Ansor).
+* **AutoTVM**: The automation system requires users to define “schedule templates” as the design space for each operator. Therefore, it is inextensible to hundreds of operators.
+* **AutoScheduler (Ansor)**: It automatically generates schedule templates as the design space, according to a set of predefined “search rules”. However, it is non-trivial to extend AutoScheduler to new schedule primitives (tensorize, loop partition, software pipelining).
+* The three systems above have isolated sets of APIs with several layers of their own abstraction, which are not only hard to learn, but also engineering-intensive to customize.
+
+**Benefit of Meta Schedule**
+
+* Succinct syntax, consistent APIs to TensorIR schedule with no other layer of abstraction.
+* Provides unified APIs for implementing manual schedule, AutoTVM and AutoScheduler (Ansor).
+* Extensibility to all the schedule primitives, including tensorization and loop partitioning. Almost no extra effort is needed to use a new primitive in auto-tuning.
+* The automation infrastructure is customizable across every layer.
+
+## 3. Guide-level explanation
+
+In this section, we describe the syntax of meta schedule DSL, and how it could be used to describe and auto-generate the design space.
+
+### 3.1. Manual Schedule
+
+Meta schedule APIs are almost the same as TE or TensorIR scheduling. Here is an example of a manual schedule for matrix multiplication:
+
+```python
+# Designate a set of tile sizes
+i_tiles = [16, 8, 8, 8]
+j_tiles = [16, 8, 8, 8]
+k_tiles = [256, 8]
+
+# Tile the loops according to the tile sizes
+i_0, i_1, i_2, i_3 = sch.split(loop=i, factors=i_tiles)
+j_0, j_1, j_2, j_3 = sch.split(loop=j, factors=j_tiles)
+k_0, k_1           = sch.split(loop=k, factors=k_tiles)
+
+# Organize the loops into “SSRSRS” 6-level tiles
+sch.reorder(
+    i_0, j_0, # S
+    i_1, j_1, # S
+    k_0,      # R
+    i_2, j_2, # S
+    k_1,      # R
+    i_3, j_3, # S
+)
+```
+
+In this example, the developers may tweak the tile sizes and measure the performance of the generated kernels to explore the opportunities of potential optimization.
+
+Generally speaking, while writing a schedule, there are often some parameters that are hard to determine ahead of time, for example, tile sizes, unroll steps, or which tensor intrinsics to use. Developers may manually enumerate possible combinations of these unknown factors, and then pick the best schedule according to measurement results on their device.
+
+### 3.2. AutoTVM-style Design Space Description
+
+Meta schedule extends the schedule DSL with sampling instructions. When included in a schedule, these instructions parametrize the schedule from a single deterministic point to a space supported by random variables (tile size, etc.), making it possible for developers to describe the design space with meta schedule APIs.
+
+We can extend the matmul example above to cover all possible tilings using these sampling instructions:
+
+```python
+# Sample tile sizes
+i_tiles = sch.sample_perfect_tile(i, n=4)
+j_tiles = sch.sample_perfect_tile(j, n=4)
+k_tiles = sch.sample_perfect_tile(k, n=2)
+# Tile the loops according to the random variables
+i_0, i_1, i_2, i_3 = sch.split(loop=i, factors=i_tiles)
+j_0, j_1, j_2, j_3 = sch.split(loop=j, factors=j_tiles)
+k_0, k_1           = sch.split(loop=k, factors=k_tiles)
+# Organize the loops into “SSRSRS” 6-level tiles
+sch.reorder(
+    i_0, j_0, # S
+    i_1, j_1, # S
+    k_0,      # R
+    i_2, j_2, # S
+    k_1,      # R
+    i_3, j_3, # S
+)
+```
+
+### 3.3. Composite Schedule
+
+Each schedule primitive handles only a very basic operation to transform the IR, for example, `split` only splits a loop into two. In the real world, the over-fine granularity of those primitives usually leads to repetitive and verbose scheduling code, as [mentioned](https://discuss.tvm.apache.org/t/rfc-tensorir-a-schedulable-ir-for-tvm/7872/43?u=junrushao1994) by developers in our community.
+
+To counter this challenge, we allow users to register “composite schedules” that analyze the IR, and apply a set of schedule primitives correspondingly. For instance, a composite schedule may inspect a TensorIR block and decide whether we should call `compute_inline` on it. The composite schedule may use sampling instructions to fill in undecided choices.
+
+Our system also ships with some built-in composite schedules, including:
+
+* Multi-level tiling
+* Inline pure spatial blocks
+* Parallelize & vectorize & unroll
+* Auto tensorize
+* …
+
+### 3.4. AutoScheduler-style Design Space Generation
+
+AutoScheduler (Ansor) generates schedule templates by applying their SearchRules to each stage. Meta schedule treats a search rule as a composite schedule, and applies each composite schedule to each block of TensorIR to generate the design space.
+
+### 3.5. Unifying manual schedule / AutoTVM / Ansor
+
+In this section, we show that the design space induced by TE manual schedule, AutoTVM and Ansor are all subsets of meta schedule, and meta schedule further allows mixing those three styles to search jointly.
+
+**Manual schedule**. The TE schedule is a special case of a meta schedule program, where there is no randomness introduced by sampling instructions. It is a single point in terms of design space.
+
+**AutoTVM (Template-based tuning)**. Writing one or more schedule functions in meta schedule, potentially with sampling instructions, is a natural representation of AutoTVM’s schedule templates (knobs). The PPL generates one or more traces as the design space to explore.
+
+**AutoScheduler (Ansor, Template-free tuning)**. As mentioned in the previous section, application of composite schedule rules generates the design space, which is equivalent to Ansor’s sketch generation.
+
+**Mixing styles in design space definition**. By taking union of the spaces induced by the three special cases, our system allows developers to combine generic rules that Ansor provides and operator-specific scheduling.
+
+## 4. Reference-level explanation
+
+In this section, we introduce the underlying techniques for the automation system to extract and explore the design space.
+
+### 4.1. Execution trace as the design space
+
+**Trace**. To represent the design space defined by the meta schedule DSL, the underlying system records all the instructions users applied to the schedule class, including sampling and schedule primitives. We call this list of instructions a trace.
+
+Executing the example above results in the following trace:
+
+```
+Instruction 0. Sample-Perfect-Tile
+Instruction 1. Sample-Perfect-Tile
+Instruction 2. Sample-Perfect-Tile
+Instruction 3. Split
+Instruction 4. Split
+Instruction 5. Split
+Instruction 6. Reorder
+```
+
+The trace is not directly user-facing, but a data structure inside the user-facing `Schedule` class that records the execution. The automation system extracts the trace and finds out the design space according to the sampling instructions.
+
+**Union of traces**. Often a single trace is unable to represent the entire space. Therefore, more precisely, our system works on a list of traces as the union of potential design space.
+
+**Fork a trace**. When two different decisions in the scheduling process are equally important to generate high-performance schedules, we allow forking the trace into two, and the design space is the union of the forked traces.
+
+### 4.2. Exploring the Search Space
+
+Meta Schedule provides several built-in exploration strategies to exhaustively or efficiently search for efficient schedules.
+
+**Program replay**. A simple strategy that replays the schedule program that generates the PPL, and doesn’t use any advantage provided by the PPL.
+
+**Random search**. Extracts the PPL, and repetitively re-executes the PPL by flipping coins purely randomly.
+
+**Cost-model-guided evolutionary search**. A more efficient exploration strategy. We define two sets of rules:
+
+* Mutator: defines how to jump to a point’s “neighbor” in the design space
+* Postprocessor: sometimes it is non-trivial to statically determine the PPL, for example:
+  * There is a hard requirement in CUDA that the maximum number of threads should not exceed 1024, but it is a random variable that cannot be determined before actually executing the PPL. In this case, we write a postprocessor that errors out when the condition is not satisfied.
+  * The number of outer loops to be fused together depends on their extents, which are random variables. In this case, we annotate the maximum extent allowed on the block, and do actual fusion in a postprocessor.
+
+Our evolutionary search algorithm uses mutators to find possible schedules in the design space, then applies postprocessors, and asks the cost model to predict its performance. After several iterations, the new schedules with the highest scores are finally compiled and measured on device. Epsilon-greedy is used in this process to balance exploitation and exploration.
+
+### 4.3. Python first for flexibility & customizability
+
+We implement the system in a way that all levels are decoupled and open to customization, aiming at providing a playground for developers to try out new ideas and potentially deliver performance quickly.
+
+While all the important APIs are implemented in C++ for efficiency, every part of the system can be easily switched to customized python implementation. For example,
+
+**Customize design space in python**. Can be a python function that does the schedule
+
+```python
+def schedule_matmul(sch) -> sch:
+    i, j, k = sch.get_loops(sch.get_block(“matmul”))
+    i_tiles = sch.sample_perfect_tile(i, n=4)
+    j_tiles = sch.sample_perfect_tile(j, n=4)
+    k_tiles = sch.sample_perfect_tile(k, n=2)
+    # Tile the loops according to the random variables
+    i_0, i_1, i_2, i_3 = sch.split(loop=i, factors=i_tiles)
+    j_0, j_1, j_2, j_3 = sch.split(loop=j, factors=j_tiles)
+    k_0, k_1 = sch.split(loop=k, factors=k_tiles)
+    # Organize the loops into “SSRSRS” 6-level tiles
+    sch.reorder(
+        i_0, j_0, # S
+        i_1, j_1, # S
+        k_0,      # R
+        i_2, j_2, # S
+        k_1,      # R
+        i_3, j_3, # S
+    )
+    return sch
+```
+
+**Customize composite schedule in python**. We provide two ways to define a composite schedule in python:
+
+Method 1. A simple decorator that converts a python function to a composite schedule
+
+```python
+@tir.as_composite_schedule(name="multi-level-tiling")
+def multi_level_tiling(sch: Schedule, block: BlockRV) -> Union[Schedule, List[Schedule]]:
+    ...
+```
+
+Method 2. Derive from `PyCompositeSchedule`, providing extra functionalities like initialization
+
+```python
+class MultiLevelTiling(PyCompositeSchedule):
+    def initialize(...):
+        ...
+    def apply(...):
+        ...
+```
+
+**Customize exploration strategies in python**. Developers can implement any search algorithm in python as well by deriving from `PySearchPolicy`.

Review comment:
       The syntax is identical to customizing with `PyCompositeSchedule`. I will add it to the doc




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tvm-rfcs] junrushao1994 commented on a change in pull request #5: [RFC] Meta Schedule (AutoTIR)

Posted by GitBox <gi...@apache.org>.
junrushao1994 commented on a change in pull request #5:
URL: https://github.com/apache/tvm-rfcs/pull/5#discussion_r668502241



##########
File path: rfcs/0001-meta-schedule-autotensorir.md
##########
@@ -0,0 +1,444 @@
+<!--- Licensed to the Apache Software Foundation (ASF) under one -->
+<!--- or more contributor license agreements.  See the NOTICE file -->
+<!--- distributed with this work for additional information -->
+<!--- regarding copyright ownership.  The ASF licenses this file -->
+<!--- to you under the Apache License, Version 2.0 (the -->
+<!--- "License"); you may not use this file except in compliance -->
+<!--- with the License.  You may obtain a copy of the License at -->
+
+<!---   http://www.apache.org/licenses/LICENSE-2.0 -->
+
+<!--- Unless required by applicable law or agreed to in writing, -->
+<!--- software distributed under the License is distributed on an -->
+<!--- "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY -->
+<!--- KIND, either express or implied.  See the License for the -->
+<!--- specific language governing permissions and limitations -->
+<!--- under the License. -->
+
+* Feature Name: Meta Schedule (Formerly AutoTIR)
+* Start Date: 2021-05-28
+* RFC PR: TBD (apache/tvm-rfcs#0000)
+* GitHub Issue: TBD (apache/tvm-rfcs#0000)
+
+## 1. Summary
+
+This proposal introduces Meta Schedule: a probabilistic scheduling DSL on TIR that unifies the
+approaches of AutoTVM and Auto Scheduler (Ansor). Meta schedule provides a pragmatic way to define
+the space of automatic tuning, extensibility in terms of all possible TIR schedule primitives like
+tensorization and loop partitioning, and customizability on every layer of the automation system.
+
+Meta Schedule is our 3rd generation automatic scheduling system.
+
+## 2. Motivation
+
+**Scheduling and design space.** In TVM TensorIR, optimization of a TensorIR program is done via a
+sequence of transformations. For example, we reorder loops for better locality and we tensorize for
+specific hardware intrinsics. The process of invoking such a set of pre-defined transformations is
+called "**scheduling**", and each transformation is called a "**schedule primitive**". These
+primitives form a domain-specific language (DSL) describing the transformation of TensorIR programs.
+**Design space** is the set of all possible schedulings with respect to a TensorIR program.
+
+**Problems with the current scheduling system.** Currently we have 3 sets of scheduling APIs:
+* **Manual schedule**: Developers optimize their programs by manually invoking schedule primitives,
+  i.e. explore points in the design space with humans in the loop. This can be a tedious and
+  error-prone approach, hence the creation of AutoTVM and AutoScheduler (Ansor).
+* **AutoTVM**: The automation system requires users to define "schedule templates" as the design
+  space for each operator. Therefore, it is inextensible to hundreds of operators and variety
+  hardware platforms.
+* **AutoScheduler (Ansor)**: It automatically generates schedule templates as the design space,
+  according to a set of predefined "search rules". However, it is non-trivial to extend
+  AutoScheduler to new schedule primitives (tensorize, loop partition, software pipelining, etc).
+* The three systems above have isolated sets of APIs with several layers of their own abstraction,
+  which are not only hard to learn, but also engineering-intensive to customize.
+
+**Benefits of Meta Schedule.**  Meta schedule provides:
+* Succinct syntax, consistent APIs to TensorIR schedule with no other layer of abstraction.
+* Unified APIs for implementing manual schedules, AutoTVM-style schedules, and AutoScheduler-style
+  schedules.
+* Extensibility of all the schedule primitives, including tensorization and loop partitioning.
+  Almost no extra effort is needed to use a new primitive in auto-tuning.
+* The automation infrastructure is extensible on every of its components. Every component of the
+  system can be customized easily in pure python or C++ or both. For example, one can develop a new
+  design space generator in python, a new ProgramRunner in python, etc.
+
+
+## 3. Guide-level explanation
+
+In this section, we describe the syntax of meta schedule DSL, and how it could be used to describe
+and auto-generate the design space.
+
+### 3.1. Manual Schedule
+
+Meta schedule APIs are almost the same as TE or TensorIR scheduling. Here is an example of a manual
+schedule for matrix multiplication:
+
+```python
+# Designate a set of tile sizes
+i_tiles = [16, 8, 8, 8]
+j_tiles = [16, 8, 8, 8]
+k_tiles = [256, 8]
+
+# Tile the loops according to the tile sizes
+i_0, i_1, i_2, i_3 = sch.split(loop=i, factors=i_tiles)
+j_0, j_1, j_2, j_3 = sch.split(loop=j, factors=j_tiles)
+k_0, k_1           = sch.split(loop=k, factors=k_tiles)
+
+# Organize the loops into "SSRSRS" 6-level tiles
+sch.reorder(
+    i_0, j_0, # S: the 1st spatial tile
+    i_1, j_1, # S: the 2nd spatial tile
+    k_0,      # R: the 1st reduction tile
+    i_2, j_2, # S: the 3rd spatial tile
+    k_1,      # R: the 2nd reduction tile
+    i_3, j_3, # S: the 4th spatial tile
+)
+```
+
+In this example, the developers may tweak the tile sizes and measure the performance of the
+generated kernels to explore the opportunities of potential optimization.
+
+Generally speaking, while writing a schedule, there are often some parameters that are hard to
+determine ahead of time, for example, tile sizes, unroll steps, or which tensor intrinsics to use.
+Developers may manually enumerate possible combinations of these unknown factors, and then pick the
+best schedule according to measurement results on their device.
+
+### 3.2. Composite Schedule Rules
+
+As introduced in the previous section, in TensorIR, each schedule primitive handles only a very
+basic transformation of the IR. For example, `split` only splits a loop into two new loops. In the
+real world, the over-fine granularity of those primitives usually leads to repetitive and verbose
+scheduling code, as
+[mentioned](https://discuss.tvm.apache.org/t/rfc-tensorir-a-schedulable-ir-for-tvm/7872/43?u=junrushao1994)
+by developers in our community.
+
+To make it more convenient and modular, we allow users to register "composite schedules" that apply
+a sequence of schedule primitives according to certain analysis of the IR. For instance, a composite
+schedule may inspect a TensorIR block and decide whether we should call `compute_inline` on it.
+
+### 3.3. AutoTVM-style Design Space Description
+
+Meta schedule extends the schedule DSL with sampling instructions. When included in a schedule,
+these instructions parametrize the schedule from a single deterministic point to a space supported
+by random variables (tile size, etc.), making it possible for developers to describe the design
+space with meta schedule APIs.
+
+We can extend the matmul example above to cover all possible tilings using these sampling
+instructions:
+
+```python
+# Sample tile sizes
+i_tiles = sch.sample_perfect_tile(i, n=4)
+j_tiles = sch.sample_perfect_tile(j, n=4)
+k_tiles = sch.sample_perfect_tile(k, n=2)
+# Tile the loops according to the random variables
+i_0, i_1, i_2, i_3 = sch.split(loop=i, factors=i_tiles)
+j_0, j_1, j_2, j_3 = sch.split(loop=j, factors=j_tiles)
+k_0, k_1           = sch.split(loop=k, factors=k_tiles)
+# Organize the loops into "SSRSRS" 6-level tiles
+sch.reorder(
+    i_0, j_0, # S: the 1st spatial tile
+    i_1, j_1, # S: the 2nd spatial tile
+    k_0,      # R: the 1st reduction tile
+    i_2, j_2, # S: the 3rd spatial tile
+    k_1,      # R: the 2nd reduction tile
+    i_3, j_3, # S: the 4th spatial tile
+)
+```
+
+### 3.4. AutoScheduler-style Design Space Generation
+
+AutoScheduler (Ansor) generates schedule templates by applying their SearchRules to each stage.
+SearchRule analyzes TE and eagerly trigger schedule primitives accordingly in its internally
+maintained mini IR.
+
+As introduced in Section 3.2, composite schedule rules are equivalent to AutoScheduler's SearchRule
+in TensorIR scheduling. To further generate a design space for scheduling, sampling instructions are
+used in composite schedule rules. Similarly, the sketch generation phase in AutoScheduler is
+equivalent to applying composite schedule rules to each block in TensorIR.
+
+Several built-in composite schedule rules are shipped with our system to align with Ansor's design
+space:
+
+* Multi-level tiling
+* Inline pure spatial blocks
+* Parallelize & vectorize & unroll
+* Auto tensorize
+
+Developers may implement their own rules in either Python or C++. They may specify which rules to
+use with the following syntax:
+
+```python
+from tvm import meta_schedule as ms
+
+design_space_generator = ms.PostOrderApply(rules=[
+    ms.MultiLevelTiling(...),
+    CustomRule(...),
+    ms.OtherBuiltinRules(...),
+])
+
+```
+
+### 3.5. Unifying manual schedule / AutoTVM / Ansor
+
+In this section, we show that the design space induced by TE manual schedule, AutoTVM and Ansor are
+all subsets of meta schedule, and meta schedule further allows mixing those three styles to search
+jointly.
+
+**Manual schedule**. The TE schedule is a special case of a meta schedule program, where there is no
+randomness introduced by sampling instructions. It is a single point in terms of design space.
+
+**AutoTVM (Template-based tuning)**. It is more natural representation of AutoTVM’s schedule
+templates (knobs) by writing one or more schedule functions in meta schedule with sampling
+instructions. The execution traces generated by the schedule functions are the design space to be
+explored.
+
+**AutoScheduler (Ansor, Template-free tuning)**. As mentioned in the previous section, application
+of composite schedule rules generates the design space, which is equivalent to Ansor’s sketch
+generation.
+
+**Mixing styles in design space definition**. By taking union of the spaces induced by the three

Review comment:
       I will leave step 1 & 2 to TECompiler. Step 3 & 5 are described in our Section 4.2 (exploring the design space)




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tvm-rfcs] junrushao1994 commented on a change in pull request #5: [RFC] Meta Schedule (AutoTIR)

Posted by GitBox <gi...@apache.org>.
junrushao1994 commented on a change in pull request #5:
URL: https://github.com/apache/tvm-rfcs/pull/5#discussion_r661787696



##########
File path: rfcs/0001-meta-schedule-autotensorir.md
##########
@@ -0,0 +1,297 @@
+* Feature Name: Meta Schedule (AutoTensorIR)
+* Start Date: 2021-05-28
+* RFC PR: TBD (apache/tvm-rfcs#0000)
+* GitHub Issue: TBD (apache/tvm-rfcs#0000)
+
+## 1. Summary
+
+This proposal introduces Meta Schedule: a probabilistic scheduling DSL on TIR that unifies the approaches of AutoTVM and Auto Scheduler (Ansor). Meta schedule provides a pragmatic way to define the space of automatic tuning, extensibility in terms of all possible TIR schedule primitives like tensorization and loop partitioning, and customizability on every layer of the automation system.
+
+Meta Schedule is our 3rd generation automatic scheduling system.
+
+## 2. Motivation
+
+**Scheduling and Design Space**
+
+In TVM TensorIR, optimization of a TensorIR program is done via a sequence of transformations. For example, we reorder loops for better locality and we tensorize for specific hardware intrinsics. The process of invoking such a set of pre-defined transformations is called “**scheduling**”, and each transformation is called a “**schedule primitive**”. These primitives form a domain-specific language (DSL) describing the transformation of TensorIR programs. **Design space** is the set of all possible schedulings with respect to a TensorIR program.
+
+**Problems with the Current Scheduling System**
+
+* **Manual schedule**: Developers optimize their programs by manually invoking schedule primitives, i.e. explore points in the design space with humans in the loop. This can be a tedious and error-prone approach, hence the creation of AutoTVM and AutoScheduler (Ansor).
+* **AutoTVM**: The automation system requires users to define “schedule templates” as the design space for each operator. Therefore, it is inextensible to hundreds of operators.
+* **AutoScheduler (Ansor)**: It automatically generates schedule templates as the design space, according to a set of predefined “search rules”. However, it is non-trivial to extend AutoScheduler to new schedule primitives (tensorize, loop partition, software pipelining).
+* The three systems above have isolated sets of APIs with several layers of their own abstraction, which are not only hard to learn, but also engineering-intensive to customize.
+
+**Benefit of Meta Schedule**
+
+* Succinct syntax, consistent APIs to TensorIR schedule with no other layer of abstraction.
+* Provides unified APIs for implementing manual schedule, AutoTVM and AutoScheduler (Ansor).
+* Extensibility to all the schedule primitives, including tensorization and loop partitioning. Almost no extra effort is needed to use a new primitive in auto-tuning.
+* The automation infrastructure is customizable across every layer.
+
+## 3. Guide-level explanation
+
+In this section, we describe the syntax of meta schedule DSL, and how it could be used to describe and auto-generate the design space.
+
+### 3.1. Manual Schedule
+
+Meta schedule APIs are almost the same as TE or TensorIR scheduling. Here is an example of a manual schedule for matrix multiplication:
+
+```python
+# Designate a set of tile sizes
+i_tiles = [16, 8, 8, 8]
+j_tiles = [16, 8, 8, 8]
+k_tiles = [256, 8]
+
+# Tile the loops according to the tile sizes
+i_0, i_1, i_2, i_3 = sch.split(loop=i, factors=i_tiles)
+j_0, j_1, j_2, j_3 = sch.split(loop=j, factors=j_tiles)
+k_0, k_1           = sch.split(loop=k, factors=k_tiles)
+
+# Organize the loops into “SSRSRS” 6-level tiles
+sch.reorder(
+    i_0, j_0, # S
+    i_1, j_1, # S
+    k_0,      # R
+    i_2, j_2, # S
+    k_1,      # R
+    i_3, j_3, # S
+)
+```
+
+In this example, the developers may tweak the tile sizes and measure the performance of the generated kernels to explore the opportunities of potential optimization.
+
+Generally speaking, while writing a schedule, there are often some parameters that are hard to determine ahead of time, for example, tile sizes, unroll steps, or which tensor intrinsics to use. Developers may manually enumerate possible combinations of these unknown factors, and then pick the best schedule according to measurement results on their device.
+
+### 3.2. AutoTVM-style Design Space Description
+
+Meta schedule extends the schedule DSL with sampling instructions. When included in a schedule, these instructions parametrize the schedule from a single deterministic point to a space supported by random variables (tile size, etc.), making it possible for developers to describe the design space with meta schedule APIs.
+
+We can extend the matmul example above to cover all possible tilings using these sampling instructions:
+
+```python
+# Sample tile sizes
+i_tiles = sch.sample_perfect_tile(i, n=4)
+j_tiles = sch.sample_perfect_tile(j, n=4)
+k_tiles = sch.sample_perfect_tile(k, n=2)
+# Tile the loops according to the random variables
+i_0, i_1, i_2, i_3 = sch.split(loop=i, factors=i_tiles)
+j_0, j_1, j_2, j_3 = sch.split(loop=j, factors=j_tiles)
+k_0, k_1           = sch.split(loop=k, factors=k_tiles)
+# Organize the loops into “SSRSRS” 6-level tiles
+sch.reorder(
+    i_0, j_0, # S
+    i_1, j_1, # S
+    k_0,      # R
+    i_2, j_2, # S
+    k_1,      # R
+    i_3, j_3, # S
+)
+```
+
+### 3.3. Composite Schedule
+
+Each schedule primitive handles only a very basic operation to transform the IR, for example, `split` only splits a loop into two. In the real world, the over-fine granularity of those primitives usually leads to repetitive and verbose scheduling code, as [mentioned](https://discuss.tvm.apache.org/t/rfc-tensorir-a-schedulable-ir-for-tvm/7872/43?u=junrushao1994) by developers in our community.
+
+To counter this challenge, we allow users to register “composite schedules” that analyze the IR, and apply a set of schedule primitives correspondingly. For instance, a composite schedule may inspect a TensorIR block and decide whether we should call `compute_inline` on it. The composite schedule may use sampling instructions to fill in undecided choices.
+
+Our system also ships with some built-in composite schedules, including:
+
+* Multi-level tiling
+* Inline pure spatial blocks
+* Parallelize & vectorize & unroll
+* Auto tensorize
+* …
+
+### 3.4. AutoScheduler-style Design Space Generation
+
+AutoScheduler (Ansor) generates schedule templates by applying their SearchRules to each stage. Meta schedule treats a search rule as a composite schedule, and applies each composite schedule to each block of TensorIR to generate the design space.
+
+### 3.5. Unifying manual schedule / AutoTVM / Ansor
+
+In this section, we show that the design space induced by TE manual schedule, AutoTVM and Ansor are all subsets of meta schedule, and meta schedule further allows mixing those three styles to search jointly.
+
+**Manual schedule**. The TE schedule is a special case of a meta schedule program, where there is no randomness introduced by sampling instructions. It is a single point in terms of design space.
+
+**AutoTVM (Template-based tuning)**. Writing one or more schedule functions in meta schedule, potentially with sampling instructions, is a natural representation of AutoTVM’s schedule templates (knobs). The PPL generates one or more traces as the design space to explore.
+
+**AutoScheduler (Ansor, Template-free tuning)**. As mentioned in the previous section, application of composite schedule rules generates the design space, which is equivalent to Ansor’s sketch generation.
+
+**Mixing styles in design space definition**. By taking union of the spaces induced by the three special cases, our system allows developers to combine generic rules that Ansor provides and operator-specific scheduling.
+
+## 4. Reference-level explanation
+
+In this section, we introduce the underlying techniques for the automation system to extract and explore the design space.
+
+### 4.1. Execution trace as the design space
+
+**Trace**. To represent the design space defined by the meta schedule DSL, the underlying system records all the instructions users applied to the schedule class, including sampling and schedule primitives. We call this list of instructions a trace.
+
+Executing the example above results in the following trace:
+
+```
+Instruction 0. Sample-Perfect-Tile
+Instruction 1. Sample-Perfect-Tile
+Instruction 2. Sample-Perfect-Tile
+Instruction 3. Split
+Instruction 4. Split
+Instruction 5. Split
+Instruction 6. Reorder
+```
+
+The trace is not directly user-facing, but a data structure inside the user-facing `Schedule` class that records the execution. The automation system extracts the trace and finds out the design space according to the sampling instructions.
+
+**Union of traces**. Often a single trace is unable to represent the entire space. Therefore, more precisely, our system works on a list of traces as the union of potential design space.
+
+**Fork a trace**. When two different decisions in the scheduling process are equally important to generate high-performance schedules, we allow forking the trace into two, and the design space is the union of the forked traces.

Review comment:
       Exactly




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tvm-rfcs] junrushao1994 commented on a change in pull request #5: [RFC] Meta Schedule (AutoTIR)

Posted by GitBox <gi...@apache.org>.
junrushao1994 commented on a change in pull request #5:
URL: https://github.com/apache/tvm-rfcs/pull/5#discussion_r668281045



##########
File path: rfcs/0001-meta-schedule-autotensorir.md
##########
@@ -0,0 +1,444 @@
+<!--- Licensed to the Apache Software Foundation (ASF) under one -->
+<!--- or more contributor license agreements.  See the NOTICE file -->
+<!--- distributed with this work for additional information -->
+<!--- regarding copyright ownership.  The ASF licenses this file -->
+<!--- to you under the Apache License, Version 2.0 (the -->
+<!--- "License"); you may not use this file except in compliance -->
+<!--- with the License.  You may obtain a copy of the License at -->
+
+<!---   http://www.apache.org/licenses/LICENSE-2.0 -->
+
+<!--- Unless required by applicable law or agreed to in writing, -->
+<!--- software distributed under the License is distributed on an -->
+<!--- "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY -->
+<!--- KIND, either express or implied.  See the License for the -->
+<!--- specific language governing permissions and limitations -->
+<!--- under the License. -->
+
+* Feature Name: Meta Schedule (Formerly AutoTIR)
+* Start Date: 2021-05-28
+* RFC PR: TBD (apache/tvm-rfcs#0000)
+* GitHub Issue: TBD (apache/tvm-rfcs#0000)
+
+## 1. Summary
+
+This proposal introduces Meta Schedule: a probabilistic scheduling DSL on TIR that unifies the
+approaches of AutoTVM and Auto Scheduler (Ansor). Meta schedule provides a pragmatic way to define
+the space of automatic tuning, extensibility in terms of all possible TIR schedule primitives like
+tensorization and loop partitioning, and customizability on every layer of the automation system.
+
+Meta Schedule is our 3rd generation automatic scheduling system.
+
+## 2. Motivation
+
+**Scheduling and design space.** In TVM TensorIR, optimization of a TensorIR program is done via a
+sequence of transformations. For example, we reorder loops for better locality and we tensorize for
+specific hardware intrinsics. The process of invoking such a set of pre-defined transformations is
+called "**scheduling**", and each transformation is called a "**schedule primitive**". These
+primitives form a domain-specific language (DSL) describing the transformation of TensorIR programs.
+**Design space** is the set of all possible schedulings with respect to a TensorIR program.
+
+**Problems with the current scheduling system.** Currently we have 3 sets of scheduling APIs:
+* **Manual schedule**: Developers optimize their programs by manually invoking schedule primitives,
+  i.e. explore points in the design space with humans in the loop. This can be a tedious and
+  error-prone approach, hence the creation of AutoTVM and AutoScheduler (Ansor).
+* **AutoTVM**: The automation system requires users to define "schedule templates" as the design
+  space for each operator. Therefore, it is inextensible to hundreds of operators and variety
+  hardware platforms.
+* **AutoScheduler (Ansor)**: It automatically generates schedule templates as the design space,
+  according to a set of predefined "search rules". However, it is non-trivial to extend
+  AutoScheduler to new schedule primitives (tensorize, loop partition, software pipelining, etc).
+* The three systems above have isolated sets of APIs with several layers of their own abstraction,
+  which are not only hard to learn, but also engineering-intensive to customize.
+
+**Benefits of Meta Schedule.**  Meta schedule provides:
+* Succinct syntax, consistent APIs to TensorIR schedule with no other layer of abstraction.
+* Unified APIs for implementing manual schedules, AutoTVM-style schedules, and AutoScheduler-style
+  schedules.
+* Extensibility of all the schedule primitives, including tensorization and loop partitioning.
+  Almost no extra effort is needed to use a new primitive in auto-tuning.
+* The automation infrastructure is extensible on every of its components. Every component of the
+  system can be customized easily in pure python or C++ or both. For example, one can develop a new
+  design space generator in python, a new ProgramRunner in python, etc.
+
+
+## 3. Guide-level explanation
+
+In this section, we describe the syntax of meta schedule DSL, and how it could be used to describe
+and auto-generate the design space.
+
+### 3.1. Manual Schedule
+
+Meta schedule APIs are almost the same as TE or TensorIR scheduling. Here is an example of a manual
+schedule for matrix multiplication:
+
+```python
+# Designate a set of tile sizes
+i_tiles = [16, 8, 8, 8]
+j_tiles = [16, 8, 8, 8]
+k_tiles = [256, 8]
+
+# Tile the loops according to the tile sizes
+i_0, i_1, i_2, i_3 = sch.split(loop=i, factors=i_tiles)
+j_0, j_1, j_2, j_3 = sch.split(loop=j, factors=j_tiles)
+k_0, k_1           = sch.split(loop=k, factors=k_tiles)
+
+# Organize the loops into "SSRSRS" 6-level tiles
+sch.reorder(
+    i_0, j_0, # S: the 1st spatial tile
+    i_1, j_1, # S: the 2nd spatial tile
+    k_0,      # R: the 1st reduction tile
+    i_2, j_2, # S: the 3rd spatial tile
+    k_1,      # R: the 2nd reduction tile
+    i_3, j_3, # S: the 4th spatial tile
+)
+```
+
+In this example, the developers may tweak the tile sizes and measure the performance of the
+generated kernels to explore the opportunities of potential optimization.
+
+Generally speaking, while writing a schedule, there are often some parameters that are hard to
+determine ahead of time, for example, tile sizes, unroll steps, or which tensor intrinsics to use.
+Developers may manually enumerate possible combinations of these unknown factors, and then pick the
+best schedule according to measurement results on their device.
+
+### 3.2. Composite Schedule Rules
+
+As introduced in the previous section, in TensorIR, each schedule primitive handles only a very
+basic transformation of the IR. For example, `split` only splits a loop into two new loops. In the

Review comment:
       Yeah let's switch to another example. Let's say in the multi-level tiling example in the previous section, we have to apply a sequence of `split` instructions on different loops, and then reorder them, all those instructions together are called "SSRSRS" multi-level tiling




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tvm-rfcs] tkonolige commented on a change in pull request #5: [RFC] Meta Schedule (AutoTensorIR)

Posted by GitBox <gi...@apache.org>.
tkonolige commented on a change in pull request #5:
URL: https://github.com/apache/tvm-rfcs/pull/5#discussion_r647867451



##########
File path: rfcs/0001-meta-schedule-autotensorir.md
##########
@@ -0,0 +1,297 @@
+* Feature Name: Meta Schedule (AutoTensorIR)

Review comment:
       Can we drop the AutoTensorIr name? We can just call all of this Meta Schedule.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tvm-rfcs] junrushao1994 commented on a change in pull request #5: [RFC] Meta Schedule (AutoTIR)

Posted by GitBox <gi...@apache.org>.
junrushao1994 commented on a change in pull request #5:
URL: https://github.com/apache/tvm-rfcs/pull/5#discussion_r676859646



##########
File path: rfcs/0001-meta-schedule-autotensorir.md
##########
@@ -107,29 +128,64 @@ best schedule according to measurement results on their device.
 As introduced in the previous section, in TensorIR, each schedule primitive handles only a very
 basic transformation of the IR. For example, `split` only splits a loop into two new loops. In the
 real world, the over-fine granularity of those primitives usually leads to repetitive and verbose
-scheduling code, as
-[mentioned](https://discuss.tvm.apache.org/t/rfc-tensorir-a-schedulable-ir-for-tvm/7872/43?u=junrushao1994)
-by developers in our community.
+scheduling code. Take the code snippet in the previous section as an example, a sequence of `split`s
+are invoked, followed by a `reorder`, and all these together are called "SSRSRS" tiling.
+
+To make it more convenient and modular, users are allowed to register **composite schedules** that apply
+a sequence of schedule primitives according to certain analysis of the IR. The word *composite* here
+is used against the word *primitive*, which means it is a transformation *composed* of those
+*primitives*.
+
+For example, suppose there is a composite schedule called `Inline-All-Elementwise-Operations`, which
+inlines all the elementwise computation into their consumers. Applying it to the following TensorIR:
+
+```python
+@tvm.script.tir
+def example_func(...):
+  for i, j in ...:
+    with tir.Block("B") ...:
+      B[i, j] = A[i, j] + 1
+  for i, j in ...:
+    with tir.Block("C") ...:
+      C[i, j] = B[i, j] + 1
+  for i, j in ...:
+    with tir.Block("D") ...:
+      D[i, j] = C[i, j] + 1
+
+sch = tir.Schedule(example_func)
+InlineAllElementwiseOperations().apply(sch, sch.get_block("D"))
+print(tvm.script.asscript(sch.mod))
+```
+
+The result after applying the composite schedule is:
 
-To make it more convenient and modular, we allow users to register "composite schedules" that apply
-a sequence of schedule primitives according to certain analysis of the IR. For instance, a composite
-schedule may inspect a TensorIR block and decide whether we should call `compute_inline` on it.
+```python
+@tvm.script.tir
+def example_func(...):
+  for i, j in ...:
+    with tir.Block("D") ...:
+      D[i, j] = A[i, j] + 1 + 1 + 1
+```
 
 ### 3.3. AutoTVM-style Design Space Description
 
-Meta schedule extends the schedule DSL with sampling instructions. When included in a schedule,
-these instructions parametrize the schedule from a single deterministic point to a space supported
-by random variables (tile size, etc.), making it possible for developers to describe the design
-space with meta schedule APIs.
+Meta schedule extends the schedule DSL with a set of new schedule primitives with randomness,
+called **sampling instructions**. These primitives do not transform the TensorIR,
+but instead will generate random decisions from specific distributions in each run,

Review comment:
       That's correct. In general, it allows developers to define either a finite or a infinite space of potential schedulings




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tvm-rfcs] junrushao1994 commented on a change in pull request #5: [RFC] Meta Schedule (AutoTIR)

Posted by GitBox <gi...@apache.org>.
junrushao1994 commented on a change in pull request #5:
URL: https://github.com/apache/tvm-rfcs/pull/5#discussion_r668476702



##########
File path: rfcs/0001-meta-schedule-autotensorir.md
##########
@@ -0,0 +1,444 @@
+<!--- Licensed to the Apache Software Foundation (ASF) under one -->
+<!--- or more contributor license agreements.  See the NOTICE file -->
+<!--- distributed with this work for additional information -->
+<!--- regarding copyright ownership.  The ASF licenses this file -->
+<!--- to you under the Apache License, Version 2.0 (the -->
+<!--- "License"); you may not use this file except in compliance -->
+<!--- with the License.  You may obtain a copy of the License at -->
+
+<!---   http://www.apache.org/licenses/LICENSE-2.0 -->
+
+<!--- Unless required by applicable law or agreed to in writing, -->
+<!--- software distributed under the License is distributed on an -->
+<!--- "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY -->
+<!--- KIND, either express or implied.  See the License for the -->
+<!--- specific language governing permissions and limitations -->
+<!--- under the License. -->
+
+* Feature Name: Meta Schedule (Formerly AutoTIR)
+* Start Date: 2021-05-28
+* RFC PR: TBD (apache/tvm-rfcs#0000)
+* GitHub Issue: TBD (apache/tvm-rfcs#0000)
+
+## 1. Summary
+
+This proposal introduces Meta Schedule: a probabilistic scheduling DSL on TIR that unifies the
+approaches of AutoTVM and Auto Scheduler (Ansor). Meta schedule provides a pragmatic way to define
+the space of automatic tuning, extensibility in terms of all possible TIR schedule primitives like
+tensorization and loop partitioning, and customizability on every layer of the automation system.
+
+Meta Schedule is our 3rd generation automatic scheduling system.
+
+## 2. Motivation
+
+**Scheduling and design space.** In TVM TensorIR, optimization of a TensorIR program is done via a
+sequence of transformations. For example, we reorder loops for better locality and we tensorize for
+specific hardware intrinsics. The process of invoking such a set of pre-defined transformations is
+called "**scheduling**", and each transformation is called a "**schedule primitive**". These
+primitives form a domain-specific language (DSL) describing the transformation of TensorIR programs.
+**Design space** is the set of all possible schedulings with respect to a TensorIR program.
+
+**Problems with the current scheduling system.** Currently we have 3 sets of scheduling APIs:
+* **Manual schedule**: Developers optimize their programs by manually invoking schedule primitives,
+  i.e. explore points in the design space with humans in the loop. This can be a tedious and
+  error-prone approach, hence the creation of AutoTVM and AutoScheduler (Ansor).
+* **AutoTVM**: The automation system requires users to define "schedule templates" as the design
+  space for each operator. Therefore, it is inextensible to hundreds of operators and variety
+  hardware platforms.
+* **AutoScheduler (Ansor)**: It automatically generates schedule templates as the design space,
+  according to a set of predefined "search rules". However, it is non-trivial to extend
+  AutoScheduler to new schedule primitives (tensorize, loop partition, software pipelining, etc).
+* The three systems above have isolated sets of APIs with several layers of their own abstraction,
+  which are not only hard to learn, but also engineering-intensive to customize.
+
+**Benefits of Meta Schedule.**  Meta schedule provides:
+* Succinct syntax, consistent APIs to TensorIR schedule with no other layer of abstraction.
+* Unified APIs for implementing manual schedules, AutoTVM-style schedules, and AutoScheduler-style
+  schedules.
+* Extensibility of all the schedule primitives, including tensorization and loop partitioning.
+  Almost no extra effort is needed to use a new primitive in auto-tuning.
+* The automation infrastructure is extensible on every of its components. Every component of the
+  system can be customized easily in pure python or C++ or both. For example, one can develop a new
+  design space generator in python, a new ProgramRunner in python, etc.
+
+
+## 3. Guide-level explanation
+
+In this section, we describe the syntax of meta schedule DSL, and how it could be used to describe
+and auto-generate the design space.
+
+### 3.1. Manual Schedule
+
+Meta schedule APIs are almost the same as TE or TensorIR scheduling. Here is an example of a manual
+schedule for matrix multiplication:
+
+```python
+# Designate a set of tile sizes
+i_tiles = [16, 8, 8, 8]
+j_tiles = [16, 8, 8, 8]
+k_tiles = [256, 8]
+
+# Tile the loops according to the tile sizes
+i_0, i_1, i_2, i_3 = sch.split(loop=i, factors=i_tiles)
+j_0, j_1, j_2, j_3 = sch.split(loop=j, factors=j_tiles)
+k_0, k_1           = sch.split(loop=k, factors=k_tiles)
+
+# Organize the loops into "SSRSRS" 6-level tiles
+sch.reorder(
+    i_0, j_0, # S: the 1st spatial tile
+    i_1, j_1, # S: the 2nd spatial tile
+    k_0,      # R: the 1st reduction tile
+    i_2, j_2, # S: the 3rd spatial tile
+    k_1,      # R: the 2nd reduction tile
+    i_3, j_3, # S: the 4th spatial tile
+)
+```
+
+In this example, the developers may tweak the tile sizes and measure the performance of the
+generated kernels to explore the opportunities of potential optimization.
+
+Generally speaking, while writing a schedule, there are often some parameters that are hard to
+determine ahead of time, for example, tile sizes, unroll steps, or which tensor intrinsics to use.
+Developers may manually enumerate possible combinations of these unknown factors, and then pick the
+best schedule according to measurement results on their device.
+
+### 3.2. Composite Schedule Rules
+
+As introduced in the previous section, in TensorIR, each schedule primitive handles only a very
+basic transformation of the IR. For example, `split` only splits a loop into two new loops. In the
+real world, the over-fine granularity of those primitives usually leads to repetitive and verbose
+scheduling code, as
+[mentioned](https://discuss.tvm.apache.org/t/rfc-tensorir-a-schedulable-ir-for-tvm/7872/43?u=junrushao1994)
+by developers in our community.
+
+To make it more convenient and modular, we allow users to register "composite schedules" that apply
+a sequence of schedule primitives according to certain analysis of the IR. For instance, a composite
+schedule may inspect a TensorIR block and decide whether we should call `compute_inline` on it.
+
+### 3.3. AutoTVM-style Design Space Description
+
+Meta schedule extends the schedule DSL with sampling instructions. When included in a schedule,
+these instructions parametrize the schedule from a single deterministic point to a space supported
+by random variables (tile size, etc.), making it possible for developers to describe the design
+space with meta schedule APIs.
+
+We can extend the matmul example above to cover all possible tilings using these sampling
+instructions:
+
+```python
+# Sample tile sizes
+i_tiles = sch.sample_perfect_tile(i, n=4)
+j_tiles = sch.sample_perfect_tile(j, n=4)
+k_tiles = sch.sample_perfect_tile(k, n=2)
+# Tile the loops according to the random variables
+i_0, i_1, i_2, i_3 = sch.split(loop=i, factors=i_tiles)
+j_0, j_1, j_2, j_3 = sch.split(loop=j, factors=j_tiles)
+k_0, k_1           = sch.split(loop=k, factors=k_tiles)
+# Organize the loops into "SSRSRS" 6-level tiles
+sch.reorder(
+    i_0, j_0, # S: the 1st spatial tile
+    i_1, j_1, # S: the 2nd spatial tile
+    k_0,      # R: the 1st reduction tile
+    i_2, j_2, # S: the 3rd spatial tile
+    k_1,      # R: the 2nd reduction tile
+    i_3, j_3, # S: the 4th spatial tile
+)
+```
+
+### 3.4. AutoScheduler-style Design Space Generation
+
+AutoScheduler (Ansor) generates schedule templates by applying their SearchRules to each stage.
+SearchRule analyzes TE and eagerly trigger schedule primitives accordingly in its internally
+maintained mini IR.
+
+As introduced in Section 3.2, composite schedule rules are equivalent to AutoScheduler's SearchRule
+in TensorIR scheduling. To further generate a design space for scheduling, sampling instructions are
+used in composite schedule rules. Similarly, the sketch generation phase in AutoScheduler is
+equivalent to applying composite schedule rules to each block in TensorIR.
+
+Several built-in composite schedule rules are shipped with our system to align with Ansor's design
+space:
+
+* Multi-level tiling
+* Inline pure spatial blocks
+* Parallelize & vectorize & unroll
+* Auto tensorize
+
+Developers may implement their own rules in either Python or C++. They may specify which rules to
+use with the following syntax:
+
+```python
+from tvm import meta_schedule as ms
+
+design_space_generator = ms.PostOrderApply(rules=[
+    ms.MultiLevelTiling(...),
+    CustomRule(...),
+    ms.OtherBuiltinRules(...),
+])
+
+```
+
+### 3.5. Unifying manual schedule / AutoTVM / Ansor
+
+In this section, we show that the design space induced by TE manual schedule, AutoTVM and Ansor are
+all subsets of meta schedule, and meta schedule further allows mixing those three styles to search
+jointly.
+
+**Manual schedule**. The TE schedule is a special case of a meta schedule program, where there is no
+randomness introduced by sampling instructions. It is a single point in terms of design space.
+
+**AutoTVM (Template-based tuning)**. It is more natural representation of AutoTVM’s schedule
+templates (knobs) by writing one or more schedule functions in meta schedule with sampling
+instructions. The execution traces generated by the schedule functions are the design space to be
+explored.
+
+**AutoScheduler (Ansor, Template-free tuning)**. As mentioned in the previous section, application
+of composite schedule rules generates the design space, which is equivalent to Ansor’s sketch
+generation.
+
+**Mixing styles in design space definition**. By taking union of the spaces induced by the three
+special cases, our system allows developers to combine generic rules that Ansor provides and
+operator-specific scheduling.
+
+## 4. Reference-level explanation
+
+In this section, we introduce the underlying techniques for the automation system to extract and
+explore the design space. The figure below briefly illustrates the workflow of the system:
+
+![meta-schedule-workflow](../resources/meta-schedule-workflow.png)
+
+### 4.1. Execution trace as the design space
+
+**Trace**. To represent the design space defined by the meta schedule DSL, the underlying system
+records all the instructions users applied to the schedule class, including sampling and schedule
+primitives. We call this list of instructions a trace.
+
+For instance, executing the example above results in the following trace:
+
+```
+Instruction 0. Sample-Perfect-Tile
+Instruction 1. Sample-Perfect-Tile
+Instruction 2. Sample-Perfect-Tile
+Instruction 3. Split
+Instruction 4. Split
+Instruction 5. Split
+Instruction 6. Reorder
+```
+
+**Trace forms design space.** A trace may contain zero or more sampling instructions, which
+introduces the uncertainty in scheduling - one instance of sampling results in one point in the
+design space. Therefore, the trace itself forms a design space to explore, e.g. which set of tile
+sizes works best on a specific hardware.
+
+**Union of design space**. Our system works on a set of traces, representing the union of the design
+spaces represented by every single trace.
+
+**Fork a trace**. When two different decisions in the scheduling process are equally important to
+generate high-performance schedules, we allow forking the trace into two, and the design space is
+the union of the forked traces.
+
+The trace is not strictly user-facing, but can be accessed and printed with the following syntax:
+
+```python
+# requires to trace the execution
+sch = tir.Schedule(..., traced=True)
+# do a lot of scheduling
+...
+# print the trace
+print(sch.trace)
+```
+
+And below is an example of the printed trace, which honestly reflects the schedule as a snippet of
+python scheduling function:
+
+```python
+b0 = sch.get_block(name="matmul", func_name="main")
+l1, l2, l3 = sch.get_loops(block=b0)
+v4, v5, v6, v7 = sch.sample_perfect_tile(loop=l1, n=4, max_innermost_factor=16, decision=[32, 1, 16, 2])
+v8, v9, v10, v11 = sch.sample_perfect_tile(loop=l2, n=4, max_innermost_factor=16, decision=[64, 4, 2, 2])
+v12, v13 = sch.sample_perfect_tile(loop=l3, n=2, max_innermost_factor=16, decision=[64, 16])
+l14, l15, l16, l17 = sch.split(loop=l1, factors=[v4, v5, v6, v7])
+l18, l19, l20, l21 = sch.split(loop=l2, factors=[v8, v9, v10, v11])
+l22, l23 = sch.split(loop=l3, factors=[v12, v13])
+sch.reorder(l14, l18, l15, l19, l22, l16, l20, l23, l17, l21)
+```
+
+### 4.2. Exploring the Design Space
+
+Meta Schedule provides several built-in exploration strategies to exhaustively or efficiently search
+for efficient schedules.
+
+**Random search by replaying schedule functions.** With a user-provided schedule function
+as a black-box design space generator, our system could repetitively invoke such an opaque function
+without doing any extra analysis. The function could be written in C++ or Python, or any language
+that implements packed function FFI. If sampling instructions are present in the function, then each
+invocation results in a different IRModule after being scheduled because the random decisions are
+possibly changed across different runs. Effectively, it is equivalent to
+random exploration without trace, allowing the flexibility for users to define arbitrary functions
+that trace may not well support (e.g. control flow divergence based on the value of intermediate
+random variables), but it forbids future opportunity of any trace-based analysis.
+
+**Random search by replaying traces.** Traces are obtained from a design space generator, and
+replayed with a builtin interpreter in our system. If sampling instructions are present on the
+traces, then their random decisions are mutated during each replay, i.e. jumps to a new point in the
+design space. Therefore, repetitive replay of those traces are equivalent to exploration of the
+design space. Our system could potentially benefit from trace-based analysis, including rejecting

Review comment:
       "our system" mainly refers to the strategy of space exploration, and the improvements mainly refer to system performance. Take the early rejection of invalid cuda schedules as an example, we can reject many such schedules by simply looking at the trace (faster), instead of actually running it (slower)




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tvm-rfcs] junrushao1994 commented on a change in pull request #5: [RFC] Meta Schedule (AutoTIR)

Posted by GitBox <gi...@apache.org>.
junrushao1994 commented on a change in pull request #5:
URL: https://github.com/apache/tvm-rfcs/pull/5#discussion_r668494320



##########
File path: rfcs/0001-meta-schedule-autotensorir.md
##########
@@ -0,0 +1,444 @@
+<!--- Licensed to the Apache Software Foundation (ASF) under one -->
+<!--- or more contributor license agreements.  See the NOTICE file -->
+<!--- distributed with this work for additional information -->
+<!--- regarding copyright ownership.  The ASF licenses this file -->
+<!--- to you under the Apache License, Version 2.0 (the -->
+<!--- "License"); you may not use this file except in compliance -->
+<!--- with the License.  You may obtain a copy of the License at -->
+
+<!---   http://www.apache.org/licenses/LICENSE-2.0 -->
+
+<!--- Unless required by applicable law or agreed to in writing, -->
+<!--- software distributed under the License is distributed on an -->
+<!--- "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY -->
+<!--- KIND, either express or implied.  See the License for the -->
+<!--- specific language governing permissions and limitations -->
+<!--- under the License. -->
+
+* Feature Name: Meta Schedule (Formerly AutoTIR)
+* Start Date: 2021-05-28
+* RFC PR: TBD (apache/tvm-rfcs#0000)
+* GitHub Issue: TBD (apache/tvm-rfcs#0000)
+
+## 1. Summary
+
+This proposal introduces Meta Schedule: a probabilistic scheduling DSL on TIR that unifies the
+approaches of AutoTVM and Auto Scheduler (Ansor). Meta schedule provides a pragmatic way to define
+the space of automatic tuning, extensibility in terms of all possible TIR schedule primitives like
+tensorization and loop partitioning, and customizability on every layer of the automation system.
+
+Meta Schedule is our 3rd generation automatic scheduling system.
+
+## 2. Motivation
+
+**Scheduling and design space.** In TVM TensorIR, optimization of a TensorIR program is done via a
+sequence of transformations. For example, we reorder loops for better locality and we tensorize for
+specific hardware intrinsics. The process of invoking such a set of pre-defined transformations is
+called "**scheduling**", and each transformation is called a "**schedule primitive**". These
+primitives form a domain-specific language (DSL) describing the transformation of TensorIR programs.
+**Design space** is the set of all possible schedulings with respect to a TensorIR program.
+
+**Problems with the current scheduling system.** Currently we have 3 sets of scheduling APIs:
+* **Manual schedule**: Developers optimize their programs by manually invoking schedule primitives,
+  i.e. explore points in the design space with humans in the loop. This can be a tedious and
+  error-prone approach, hence the creation of AutoTVM and AutoScheduler (Ansor).
+* **AutoTVM**: The automation system requires users to define "schedule templates" as the design
+  space for each operator. Therefore, it is inextensible to hundreds of operators and variety
+  hardware platforms.
+* **AutoScheduler (Ansor)**: It automatically generates schedule templates as the design space,
+  according to a set of predefined "search rules". However, it is non-trivial to extend
+  AutoScheduler to new schedule primitives (tensorize, loop partition, software pipelining, etc).
+* The three systems above have isolated sets of APIs with several layers of their own abstraction,
+  which are not only hard to learn, but also engineering-intensive to customize.
+
+**Benefits of Meta Schedule.**  Meta schedule provides:
+* Succinct syntax, consistent APIs to TensorIR schedule with no other layer of abstraction.
+* Unified APIs for implementing manual schedules, AutoTVM-style schedules, and AutoScheduler-style
+  schedules.
+* Extensibility of all the schedule primitives, including tensorization and loop partitioning.
+  Almost no extra effort is needed to use a new primitive in auto-tuning.
+* The automation infrastructure is extensible on every of its components. Every component of the
+  system can be customized easily in pure python or C++ or both. For example, one can develop a new
+  design space generator in python, a new ProgramRunner in python, etc.
+
+
+## 3. Guide-level explanation
+
+In this section, we describe the syntax of meta schedule DSL, and how it could be used to describe
+and auto-generate the design space.
+
+### 3.1. Manual Schedule
+
+Meta schedule APIs are almost the same as TE or TensorIR scheduling. Here is an example of a manual
+schedule for matrix multiplication:
+
+```python
+# Designate a set of tile sizes
+i_tiles = [16, 8, 8, 8]
+j_tiles = [16, 8, 8, 8]
+k_tiles = [256, 8]
+
+# Tile the loops according to the tile sizes
+i_0, i_1, i_2, i_3 = sch.split(loop=i, factors=i_tiles)
+j_0, j_1, j_2, j_3 = sch.split(loop=j, factors=j_tiles)
+k_0, k_1           = sch.split(loop=k, factors=k_tiles)
+
+# Organize the loops into "SSRSRS" 6-level tiles
+sch.reorder(
+    i_0, j_0, # S: the 1st spatial tile
+    i_1, j_1, # S: the 2nd spatial tile
+    k_0,      # R: the 1st reduction tile
+    i_2, j_2, # S: the 3rd spatial tile
+    k_1,      # R: the 2nd reduction tile
+    i_3, j_3, # S: the 4th spatial tile
+)
+```
+
+In this example, the developers may tweak the tile sizes and measure the performance of the
+generated kernels to explore the opportunities of potential optimization.
+
+Generally speaking, while writing a schedule, there are often some parameters that are hard to
+determine ahead of time, for example, tile sizes, unroll steps, or which tensor intrinsics to use.
+Developers may manually enumerate possible combinations of these unknown factors, and then pick the
+best schedule according to measurement results on their device.
+
+### 3.2. Composite Schedule Rules
+
+As introduced in the previous section, in TensorIR, each schedule primitive handles only a very
+basic transformation of the IR. For example, `split` only splits a loop into two new loops. In the
+real world, the over-fine granularity of those primitives usually leads to repetitive and verbose
+scheduling code, as
+[mentioned](https://discuss.tvm.apache.org/t/rfc-tensorir-a-schedulable-ir-for-tvm/7872/43?u=junrushao1994)
+by developers in our community.
+
+To make it more convenient and modular, we allow users to register "composite schedules" that apply
+a sequence of schedule primitives according to certain analysis of the IR. For instance, a composite
+schedule may inspect a TensorIR block and decide whether we should call `compute_inline` on it.
+
+### 3.3. AutoTVM-style Design Space Description
+
+Meta schedule extends the schedule DSL with sampling instructions. When included in a schedule,
+these instructions parametrize the schedule from a single deterministic point to a space supported
+by random variables (tile size, etc.), making it possible for developers to describe the design
+space with meta schedule APIs.
+
+We can extend the matmul example above to cover all possible tilings using these sampling
+instructions:
+
+```python
+# Sample tile sizes
+i_tiles = sch.sample_perfect_tile(i, n=4)
+j_tiles = sch.sample_perfect_tile(j, n=4)
+k_tiles = sch.sample_perfect_tile(k, n=2)
+# Tile the loops according to the random variables
+i_0, i_1, i_2, i_3 = sch.split(loop=i, factors=i_tiles)
+j_0, j_1, j_2, j_3 = sch.split(loop=j, factors=j_tiles)
+k_0, k_1           = sch.split(loop=k, factors=k_tiles)
+# Organize the loops into "SSRSRS" 6-level tiles
+sch.reorder(
+    i_0, j_0, # S: the 1st spatial tile
+    i_1, j_1, # S: the 2nd spatial tile
+    k_0,      # R: the 1st reduction tile
+    i_2, j_2, # S: the 3rd spatial tile
+    k_1,      # R: the 2nd reduction tile
+    i_3, j_3, # S: the 4th spatial tile
+)
+```
+
+### 3.4. AutoScheduler-style Design Space Generation
+
+AutoScheduler (Ansor) generates schedule templates by applying their SearchRules to each stage.
+SearchRule analyzes TE and eagerly trigger schedule primitives accordingly in its internally
+maintained mini IR.
+
+As introduced in Section 3.2, composite schedule rules are equivalent to AutoScheduler's SearchRule
+in TensorIR scheduling. To further generate a design space for scheduling, sampling instructions are
+used in composite schedule rules. Similarly, the sketch generation phase in AutoScheduler is
+equivalent to applying composite schedule rules to each block in TensorIR.
+
+Several built-in composite schedule rules are shipped with our system to align with Ansor's design
+space:
+
+* Multi-level tiling
+* Inline pure spatial blocks
+* Parallelize & vectorize & unroll
+* Auto tensorize
+
+Developers may implement their own rules in either Python or C++. They may specify which rules to
+use with the following syntax:
+
+```python
+from tvm import meta_schedule as ms
+
+design_space_generator = ms.PostOrderApply(rules=[
+    ms.MultiLevelTiling(...),
+    CustomRule(...),
+    ms.OtherBuiltinRules(...),
+])
+
+```
+
+### 3.5. Unifying manual schedule / AutoTVM / Ansor
+
+In this section, we show that the design space induced by TE manual schedule, AutoTVM and Ansor are
+all subsets of meta schedule, and meta schedule further allows mixing those three styles to search
+jointly.
+
+**Manual schedule**. The TE schedule is a special case of a meta schedule program, where there is no
+randomness introduced by sampling instructions. It is a single point in terms of design space.
+
+**AutoTVM (Template-based tuning)**. It is more natural representation of AutoTVM’s schedule
+templates (knobs) by writing one or more schedule functions in meta schedule with sampling
+instructions. The execution traces generated by the schedule functions are the design space to be
+explored.
+
+**AutoScheduler (Ansor, Template-free tuning)**. As mentioned in the previous section, application
+of composite schedule rules generates the design space, which is equivalent to Ansor’s sketch
+generation.
+
+**Mixing styles in design space definition**. By taking union of the spaces induced by the three
+special cases, our system allows developers to combine generic rules that Ansor provides and
+operator-specific scheduling.
+
+## 4. Reference-level explanation
+
+In this section, we introduce the underlying techniques for the automation system to extract and
+explore the design space. The figure below briefly illustrates the workflow of the system:
+
+![meta-schedule-workflow](../resources/meta-schedule-workflow.png)
+
+### 4.1. Execution trace as the design space
+
+**Trace**. To represent the design space defined by the meta schedule DSL, the underlying system
+records all the instructions users applied to the schedule class, including sampling and schedule
+primitives. We call this list of instructions a trace.
+
+For instance, executing the example above results in the following trace:
+
+```
+Instruction 0. Sample-Perfect-Tile
+Instruction 1. Sample-Perfect-Tile
+Instruction 2. Sample-Perfect-Tile
+Instruction 3. Split
+Instruction 4. Split
+Instruction 5. Split
+Instruction 6. Reorder
+```
+
+**Trace forms design space.** A trace may contain zero or more sampling instructions, which
+introduces the uncertainty in scheduling - one instance of sampling results in one point in the
+design space. Therefore, the trace itself forms a design space to explore, e.g. which set of tile
+sizes works best on a specific hardware.
+
+**Union of design space**. Our system works on a set of traces, representing the union of the design
+spaces represented by every single trace.
+
+**Fork a trace**. When two different decisions in the scheduling process are equally important to
+generate high-performance schedules, we allow forking the trace into two, and the design space is
+the union of the forked traces.
+
+The trace is not strictly user-facing, but can be accessed and printed with the following syntax:
+
+```python
+# requires to trace the execution
+sch = tir.Schedule(..., traced=True)
+# do a lot of scheduling
+...
+# print the trace
+print(sch.trace)
+```
+
+And below is an example of the printed trace, which honestly reflects the schedule as a snippet of
+python scheduling function:
+
+```python
+b0 = sch.get_block(name="matmul", func_name="main")
+l1, l2, l3 = sch.get_loops(block=b0)
+v4, v5, v6, v7 = sch.sample_perfect_tile(loop=l1, n=4, max_innermost_factor=16, decision=[32, 1, 16, 2])
+v8, v9, v10, v11 = sch.sample_perfect_tile(loop=l2, n=4, max_innermost_factor=16, decision=[64, 4, 2, 2])
+v12, v13 = sch.sample_perfect_tile(loop=l3, n=2, max_innermost_factor=16, decision=[64, 16])
+l14, l15, l16, l17 = sch.split(loop=l1, factors=[v4, v5, v6, v7])
+l18, l19, l20, l21 = sch.split(loop=l2, factors=[v8, v9, v10, v11])
+l22, l23 = sch.split(loop=l3, factors=[v12, v13])
+sch.reorder(l14, l18, l15, l19, l22, l16, l20, l23, l17, l21)
+```
+
+### 4.2. Exploring the Design Space
+
+Meta Schedule provides several built-in exploration strategies to exhaustively or efficiently search
+for efficient schedules.
+
+**Random search by replaying schedule functions.** With a user-provided schedule function
+as a black-box design space generator, our system could repetitively invoke such an opaque function
+without doing any extra analysis. The function could be written in C++ or Python, or any language
+that implements packed function FFI. If sampling instructions are present in the function, then each
+invocation results in a different IRModule after being scheduled because the random decisions are
+possibly changed across different runs. Effectively, it is equivalent to
+random exploration without trace, allowing the flexibility for users to define arbitrary functions
+that trace may not well support (e.g. control flow divergence based on the value of intermediate
+random variables), but it forbids future opportunity of any trace-based analysis.
+
+**Random search by replaying traces.** Traces are obtained from a design space generator, and
+replayed with a builtin interpreter in our system. If sampling instructions are present on the
+traces, then their random decisions are mutated during each replay, i.e. jumps to a new point in the
+design space. Therefore, repetitive replay of those traces are equivalent to exploration of the
+design space. Our system could potentially benefit from trace-based analysis, including rejecting
+obviously invalid schedules (e.g. using too much CUDA resources), doing dead-code elimination to
+simplify a trace, extracting trace-based features used in the cost model, etc.
+
+**Cost-model-guided evolutionary search**. A more efficient exploration strategy. We define two sets
+of rules:
+
+* Mutator: defines how to jump to a point’s "neighbor" in the design space
+* Postprocessor: after the trace is executed, there are some extra rules we want to execute, for
+  example:
+  * Check CUDA resource limits: There is a hard requirement in CUDA that the maximum number of
+    threads should not exceed 1024, but it is a random variable that cannot be determined before
+    actually executing the trace. In this case, we write a postprocessor that errors out when the
+    condition is not satisfied.
+  * Fuse outer loops until the extent of the fused loops is large enough: The number of outer loops
+    to be fused together depends on their extents, which are random variables. In this case, we
+    annotate the maximum extent allowed on the block, and do actual fusion in a postprocessor.
+
+Our evolutionary search algorithm uses mutators to find possible schedules in the design space, then
+applies postprocessors, and asks the cost model to predict its performance. After several
+iterations, the new schedules with the highest scores are finally compiled and measured on device.
+Epsilon-greedy is used in this process to balance exploitation and exploration.
+
+### 4.3. Python first for flexibility & customizability
+
+We implement the system in a way that all levels are decoupled and open to customization, aiming at
+providing a playground for developers to try out new ideas and potentially deliver performance
+quickly.
+
+While all the important APIs are implemented in C++ for efficiency, every part of the system can be
+easily switched to customized python implementation. For example,
+
+**Customize design space in python**. Can be a python function that does the schedule
+
+```python
+def schedule_matmul(sch) -> sch:
+    i, j, k = sch.get_loops(sch.get_block("matmul"))
+    i_tiles = sch.sample_perfect_tile(i, n=4)
+    j_tiles = sch.sample_perfect_tile(j, n=4)
+    k_tiles = sch.sample_perfect_tile(k, n=2)
+    # Tile the loops according to the random variables
+    i_0, i_1, i_2, i_3 = sch.split(loop=i, factors=i_tiles)
+    j_0, j_1, j_2, j_3 = sch.split(loop=j, factors=j_tiles)
+    k_0, k_1 = sch.split(loop=k, factors=k_tiles)
+    # Organize the loops into "SSRSRS" 6-level tiles
+    sch.reorder(
+        i_0, j_0, # S
+        i_1, j_1, # S
+        k_0,      # R
+        i_2, j_2, # S
+        k_1,      # R
+        i_3, j_3, # S
+    )
+    return sch
+```
+
+**Customize composite schedule in python**. We provide two ways to define a composite schedule in
+python:
+
+Method 1. Derive from `PyCompositeSchedule`, and implement two methods `initialize` and `apply`:
+
+```python
+class MultiLevelTiling(PyCompositeSchedule):
+    def initialize(...):
+        # initialize the class, usually this method is empty
+        ...
+
+    def apply(sch: Schedule, block: BlockRV) -> Union[Schedule, List[Schedule]]:
+        # write any python code, including:
+        # - analyze `block`
+        # - invoke schedule primitives in `sch`
+        # - do debug printing
+        ...
+```
+
+Method 2. A decorator as the syntactic sugar if the `initialize` method is empty, which converts the
+function to the `apply` method.
+
+```python
+@tir.as_composite_schedule(name="multi-level-tiling")
+def multi_level_tiling(sch: Schedule, block: BlockRV) -> Union[Schedule, List[Schedule]]:
+    # write any python code, including:
+    # - analyze `block`
+    # - invoke schedule primitives in `sch`
+    # - do debug printing
+    ...
+```
+
+**Customize exploration strategies in python**. Developers can implement any search algorithm in
+python as well by deriving from `PySearchPolicy`, and the syntax is identical to customizing with
+`PyCompositeSchedule`.
+
+**Other customizable components**. This list includes:
+
+* Cost model
+* Database
+* Measure callbacks
+* Feature extractor
+* Program builder & runner
+* Analysis methods
+* ...
+
+In a short summary, almost every component of the system is decoupled with each other and extensions
+could be easily plugged in.
+
+## 5. Drawbacks
+
+We are not aware of any drawbacks of the proposed system.
+
+## 6. Rationale and alternatives
+
+The system is designed with the principle of minimalism: different from alternative solutions, we do

Review comment:
       The paragraph is not well written. Just revised it:
   
   ```markdown
   The system is designed with the principle of minimalism: Assuming users already know TensorIR
   scheduling APIs, there are no more extra API set to learn, and the previous TensorIR scheduling
   programs will work out of box with meta schedule. It could potentially lower the bar of adopting
   this system.
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tvm-rfcs] junrushao1994 commented on a change in pull request #5: [RFC] Meta Schedule (AutoTensorIR)

Posted by GitBox <gi...@apache.org>.
junrushao1994 commented on a change in pull request #5:
URL: https://github.com/apache/tvm-rfcs/pull/5#discussion_r647850086



##########
File path: rfcs/0001-meta-schedule-autotensorir.md
##########
@@ -0,0 +1,297 @@
+* Feature Name: Meta Schedule (AutoTensorIR)
+* Start Date: 2021-05-28
+* RFC PR: TBD (apache/tvm-rfcs#0000)
+* GitHub Issue: TBD (apache/tvm-rfcs#0000)
+
+## 1. Summary
+
+This proposal introduces Meta Schedule: a probabilistic scheduling DSL on TIR that unifies the approaches of AutoTVM and Auto Scheduler (Ansor). Meta schedule provides a pragmatic way to define the space of automatic tuning, extensibility in terms of all possible TIR schedule primitives like tensorization and loop partitioning, and customizability on every layer of the automation system.
+
+Meta Schedule is our 3rd generation automatic scheduling system.
+
+## 2. Motivation
+
+**Scheduling and Design Space**
+
+In TVM TensorIR, optimization of a TensorIR program is done via a sequence of transformations. For example, we reorder loops for better locality and we tensorize for specific hardware intrinsics. The process of invoking such a set of pre-defined transformations is called “**scheduling**”, and each transformation is called a “**schedule primitive**”. These primitives form a domain-specific language (DSL) describing the transformation of TensorIR programs. **Design space** is the set of all possible schedulings with respect to a TensorIR program.
+
+**Problems with the Current Scheduling System**
+
+* **Manual schedule**: Developers optimize their programs by manually invoking schedule primitives, i.e. explore points in the design space with humans in the loop. This can be a tedious and error-prone approach, hence the creation of AutoTVM and AutoScheduler (Ansor).
+* **AutoTVM**: The automation system requires users to define “schedule templates” as the design space for each operator. Therefore, it is inextensible to hundreds of operators.
+* **AutoScheduler (Ansor)**: It automatically generates schedule templates as the design space, according to a set of predefined “search rules”. However, it is non-trivial to extend AutoScheduler to new schedule primitives (tensorize, loop partition, software pipelining).
+* The three systems above have isolated sets of APIs with several layers of their own abstraction, which are not only hard to learn, but also engineering-intensive to customize.
+
+**Benefit of Meta Schedule**
+
+* Succinct syntax, consistent APIs to TensorIR schedule with no other layer of abstraction.
+* Provides unified APIs for implementing manual schedule, AutoTVM and AutoScheduler (Ansor).
+* Extensibility to all the schedule primitives, including tensorization and loop partitioning. Almost no extra effort is needed to use a new primitive in auto-tuning.
+* The automation infrastructure is customizable across every layer.
+
+## 3. Guide-level explanation
+
+In this section, we describe the syntax of meta schedule DSL, and how it could be used to describe and auto-generate the design space.
+
+### 3.1. Manual Schedule
+
+Meta schedule APIs are almost the same as TE or TensorIR scheduling. Here is an example of a manual schedule for matrix multiplication:
+
+```python
+# Designate a set of tile sizes
+i_tiles = [16, 8, 8, 8]
+j_tiles = [16, 8, 8, 8]
+k_tiles = [256, 8]
+
+# Tile the loops according to the tile sizes
+i_0, i_1, i_2, i_3 = sch.split(loop=i, factors=i_tiles)
+j_0, j_1, j_2, j_3 = sch.split(loop=j, factors=j_tiles)
+k_0, k_1           = sch.split(loop=k, factors=k_tiles)
+
+# Organize the loops into “SSRSRS” 6-level tiles
+sch.reorder(
+    i_0, j_0, # S
+    i_1, j_1, # S
+    k_0,      # R
+    i_2, j_2, # S
+    k_1,      # R
+    i_3, j_3, # S
+)
+```
+
+In this example, the developers may tweak the tile sizes and measure the performance of the generated kernels to explore the opportunities of potential optimization.
+
+Generally speaking, while writing a schedule, there are often some parameters that are hard to determine ahead of time, for example, tile sizes, unroll steps, or which tensor intrinsics to use. Developers may manually enumerate possible combinations of these unknown factors, and then pick the best schedule according to measurement results on their device.
+
+### 3.2. AutoTVM-style Design Space Description
+
+Meta schedule extends the schedule DSL with sampling instructions. When included in a schedule, these instructions parametrize the schedule from a single deterministic point to a space supported by random variables (tile size, etc.), making it possible for developers to describe the design space with meta schedule APIs.
+
+We can extend the matmul example above to cover all possible tilings using these sampling instructions:
+
+```python
+# Sample tile sizes
+i_tiles = sch.sample_perfect_tile(i, n=4)
+j_tiles = sch.sample_perfect_tile(j, n=4)
+k_tiles = sch.sample_perfect_tile(k, n=2)
+# Tile the loops according to the random variables
+i_0, i_1, i_2, i_3 = sch.split(loop=i, factors=i_tiles)
+j_0, j_1, j_2, j_3 = sch.split(loop=j, factors=j_tiles)
+k_0, k_1           = sch.split(loop=k, factors=k_tiles)
+# Organize the loops into “SSRSRS” 6-level tiles
+sch.reorder(
+    i_0, j_0, # S
+    i_1, j_1, # S
+    k_0,      # R
+    i_2, j_2, # S
+    k_1,      # R
+    i_3, j_3, # S
+)
+```
+
+### 3.3. Composite Schedule
+
+Each schedule primitive handles only a very basic operation to transform the IR, for example, `split` only splits a loop into two. In the real world, the over-fine granularity of those primitives usually leads to repetitive and verbose scheduling code, as [mentioned](https://discuss.tvm.apache.org/t/rfc-tensorir-a-schedulable-ir-for-tvm/7872/43?u=junrushao1994) by developers in our community.
+
+To counter this challenge, we allow users to register “composite schedules” that analyze the IR, and apply a set of schedule primitives correspondingly. For instance, a composite schedule may inspect a TensorIR block and decide whether we should call `compute_inline` on it. The composite schedule may use sampling instructions to fill in undecided choices.
+
+Our system also ships with some built-in composite schedules, including:
+
+* Multi-level tiling
+* Inline pure spatial blocks
+* Parallelize & vectorize & unroll
+* Auto tensorize
+* …
+
+### 3.4. AutoScheduler-style Design Space Generation
+
+AutoScheduler (Ansor) generates schedule templates by applying their SearchRules to each stage. Meta schedule treats a search rule as a composite schedule, and applies each composite schedule to each block of TensorIR to generate the design space.

Review comment:
       They serve different purposes.
   
   Section 3.3 introduces a concept called "Composite Schedule", which could generate multiple primitives in contrary to a single "schedule primitive". Note that this concept is independent of the search - users can apply "Composite Schedule" manually on a block, just like applying a schedule primitive.
   
   Section 3.4 applies "Composite Schedule" to each block in TensoeIR to generate the design space, and this process is called "automatic design space generation". Note that there are alternative ways to generate design space, e.g. users could provide a python function as the design space.
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tvm-rfcs] FrozenGene commented on a change in pull request #5: [RFC] Meta Schedule (AutoTIR)

Posted by GitBox <gi...@apache.org>.
FrozenGene commented on a change in pull request #5:
URL: https://github.com/apache/tvm-rfcs/pull/5#discussion_r648058766



##########
File path: rfcs/0001-meta-schedule-autotensorir.md
##########
@@ -0,0 +1,297 @@
+* Feature Name: Meta Schedule (AutoTensorIR)

Review comment:
       If so, maybe we should alias it to `AutoTIR` so that people could search it in code.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tvm-rfcs] junrushao1994 commented on pull request #5: [RFC] Meta Schedule (AutoTIR)

Posted by GitBox <gi...@apache.org>.
junrushao1994 commented on pull request #5:
URL: https://github.com/apache/tvm-rfcs/pull/5#issuecomment-878885449


   @areusch Thank you sooo much for reviewing this RFC! I really appreciate it that you went over the text very carefully and provided very helpful suggestions! Just finished a pass revising the RFC. Would you like to take another look? Thanks a lot!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tvm-rfcs] junrushao1994 commented on a change in pull request #5: [RFC] Meta Schedule (AutoTIR)

Posted by GitBox <gi...@apache.org>.
junrushao1994 commented on a change in pull request #5:
URL: https://github.com/apache/tvm-rfcs/pull/5#discussion_r668395363



##########
File path: rfcs/0001-meta-schedule-autotensorir.md
##########
@@ -0,0 +1,444 @@
+<!--- Licensed to the Apache Software Foundation (ASF) under one -->
+<!--- or more contributor license agreements.  See the NOTICE file -->
+<!--- distributed with this work for additional information -->
+<!--- regarding copyright ownership.  The ASF licenses this file -->
+<!--- to you under the Apache License, Version 2.0 (the -->
+<!--- "License"); you may not use this file except in compliance -->
+<!--- with the License.  You may obtain a copy of the License at -->
+
+<!---   http://www.apache.org/licenses/LICENSE-2.0 -->
+
+<!--- Unless required by applicable law or agreed to in writing, -->
+<!--- software distributed under the License is distributed on an -->
+<!--- "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY -->
+<!--- KIND, either express or implied.  See the License for the -->
+<!--- specific language governing permissions and limitations -->
+<!--- under the License. -->
+
+* Feature Name: Meta Schedule (Formerly AutoTIR)
+* Start Date: 2021-05-28
+* RFC PR: TBD (apache/tvm-rfcs#0000)
+* GitHub Issue: TBD (apache/tvm-rfcs#0000)
+
+## 1. Summary
+
+This proposal introduces Meta Schedule: a probabilistic scheduling DSL on TIR that unifies the
+approaches of AutoTVM and Auto Scheduler (Ansor). Meta schedule provides a pragmatic way to define
+the space of automatic tuning, extensibility in terms of all possible TIR schedule primitives like
+tensorization and loop partitioning, and customizability on every layer of the automation system.
+
+Meta Schedule is our 3rd generation automatic scheduling system.
+
+## 2. Motivation
+
+**Scheduling and design space.** In TVM TensorIR, optimization of a TensorIR program is done via a
+sequence of transformations. For example, we reorder loops for better locality and we tensorize for
+specific hardware intrinsics. The process of invoking such a set of pre-defined transformations is
+called "**scheduling**", and each transformation is called a "**schedule primitive**". These
+primitives form a domain-specific language (DSL) describing the transformation of TensorIR programs.
+**Design space** is the set of all possible schedulings with respect to a TensorIR program.
+
+**Problems with the current scheduling system.** Currently we have 3 sets of scheduling APIs:
+* **Manual schedule**: Developers optimize their programs by manually invoking schedule primitives,
+  i.e. explore points in the design space with humans in the loop. This can be a tedious and
+  error-prone approach, hence the creation of AutoTVM and AutoScheduler (Ansor).
+* **AutoTVM**: The automation system requires users to define "schedule templates" as the design
+  space for each operator. Therefore, it is inextensible to hundreds of operators and variety
+  hardware platforms.
+* **AutoScheduler (Ansor)**: It automatically generates schedule templates as the design space,
+  according to a set of predefined "search rules". However, it is non-trivial to extend
+  AutoScheduler to new schedule primitives (tensorize, loop partition, software pipelining, etc).
+* The three systems above have isolated sets of APIs with several layers of their own abstraction,
+  which are not only hard to learn, but also engineering-intensive to customize.
+
+**Benefits of Meta Schedule.**  Meta schedule provides:
+* Succinct syntax, consistent APIs to TensorIR schedule with no other layer of abstraction.
+* Unified APIs for implementing manual schedules, AutoTVM-style schedules, and AutoScheduler-style
+  schedules.
+* Extensibility of all the schedule primitives, including tensorization and loop partitioning.
+  Almost no extra effort is needed to use a new primitive in auto-tuning.
+* The automation infrastructure is extensible on every of its components. Every component of the
+  system can be customized easily in pure python or C++ or both. For example, one can develop a new
+  design space generator in python, a new ProgramRunner in python, etc.
+
+
+## 3. Guide-level explanation
+
+In this section, we describe the syntax of meta schedule DSL, and how it could be used to describe
+and auto-generate the design space.
+
+### 3.1. Manual Schedule
+
+Meta schedule APIs are almost the same as TE or TensorIR scheduling. Here is an example of a manual
+schedule for matrix multiplication:
+
+```python
+# Designate a set of tile sizes
+i_tiles = [16, 8, 8, 8]
+j_tiles = [16, 8, 8, 8]
+k_tiles = [256, 8]
+
+# Tile the loops according to the tile sizes
+i_0, i_1, i_2, i_3 = sch.split(loop=i, factors=i_tiles)
+j_0, j_1, j_2, j_3 = sch.split(loop=j, factors=j_tiles)
+k_0, k_1           = sch.split(loop=k, factors=k_tiles)
+
+# Organize the loops into "SSRSRS" 6-level tiles
+sch.reorder(
+    i_0, j_0, # S: the 1st spatial tile
+    i_1, j_1, # S: the 2nd spatial tile
+    k_0,      # R: the 1st reduction tile
+    i_2, j_2, # S: the 3rd spatial tile
+    k_1,      # R: the 2nd reduction tile
+    i_3, j_3, # S: the 4th spatial tile
+)
+```
+
+In this example, the developers may tweak the tile sizes and measure the performance of the
+generated kernels to explore the opportunities of potential optimization.
+
+Generally speaking, while writing a schedule, there are often some parameters that are hard to
+determine ahead of time, for example, tile sizes, unroll steps, or which tensor intrinsics to use.
+Developers may manually enumerate possible combinations of these unknown factors, and then pick the
+best schedule according to measurement results on their device.
+
+### 3.2. Composite Schedule Rules
+
+As introduced in the previous section, in TensorIR, each schedule primitive handles only a very
+basic transformation of the IR. For example, `split` only splits a loop into two new loops. In the
+real world, the over-fine granularity of those primitives usually leads to repetitive and verbose
+scheduling code, as
+[mentioned](https://discuss.tvm.apache.org/t/rfc-tensorir-a-schedulable-ir-for-tvm/7872/43?u=junrushao1994)
+by developers in our community.
+
+To make it more convenient and modular, we allow users to register "composite schedules" that apply
+a sequence of schedule primitives according to certain analysis of the IR. For instance, a composite
+schedule may inspect a TensorIR block and decide whether we should call `compute_inline` on it.
+
+### 3.3. AutoTVM-style Design Space Description
+
+Meta schedule extends the schedule DSL with sampling instructions. When included in a schedule,
+these instructions parametrize the schedule from a single deterministic point to a space supported
+by random variables (tile size, etc.), making it possible for developers to describe the design
+space with meta schedule APIs.
+
+We can extend the matmul example above to cover all possible tilings using these sampling
+instructions:
+
+```python
+# Sample tile sizes
+i_tiles = sch.sample_perfect_tile(i, n=4)
+j_tiles = sch.sample_perfect_tile(j, n=4)
+k_tiles = sch.sample_perfect_tile(k, n=2)
+# Tile the loops according to the random variables
+i_0, i_1, i_2, i_3 = sch.split(loop=i, factors=i_tiles)
+j_0, j_1, j_2, j_3 = sch.split(loop=j, factors=j_tiles)
+k_0, k_1           = sch.split(loop=k, factors=k_tiles)
+# Organize the loops into "SSRSRS" 6-level tiles
+sch.reorder(
+    i_0, j_0, # S: the 1st spatial tile
+    i_1, j_1, # S: the 2nd spatial tile
+    k_0,      # R: the 1st reduction tile
+    i_2, j_2, # S: the 3rd spatial tile
+    k_1,      # R: the 2nd reduction tile
+    i_3, j_3, # S: the 4th spatial tile
+)
+```
+
+### 3.4. AutoScheduler-style Design Space Generation
+
+AutoScheduler (Ansor) generates schedule templates by applying their SearchRules to each stage.
+SearchRule analyzes TE and eagerly trigger schedule primitives accordingly in its internally
+maintained mini IR.
+
+As introduced in Section 3.2, composite schedule rules are equivalent to AutoScheduler's SearchRule
+in TensorIR scheduling. To further generate a design space for scheduling, sampling instructions are
+used in composite schedule rules. Similarly, the sketch generation phase in AutoScheduler is
+equivalent to applying composite schedule rules to each block in TensorIR.
+
+Several built-in composite schedule rules are shipped with our system to align with Ansor's design
+space:
+
+* Multi-level tiling
+* Inline pure spatial blocks
+* Parallelize & vectorize & unroll
+* Auto tensorize
+
+Developers may implement their own rules in either Python or C++. They may specify which rules to
+use with the following syntax:
+
+```python
+from tvm import meta_schedule as ms
+
+design_space_generator = ms.PostOrderApply(rules=[
+    ms.MultiLevelTiling(...),
+    CustomRule(...),
+    ms.OtherBuiltinRules(...),
+])
+
+```
+
+### 3.5. Unifying manual schedule / AutoTVM / Ansor
+
+In this section, we show that the design space induced by TE manual schedule, AutoTVM and Ansor are
+all subsets of meta schedule, and meta schedule further allows mixing those three styles to search
+jointly.
+
+**Manual schedule**. The TE schedule is a special case of a meta schedule program, where there is no
+randomness introduced by sampling instructions. It is a single point in terms of design space.
+
+**AutoTVM (Template-based tuning)**. It is more natural representation of AutoTVM’s schedule
+templates (knobs) by writing one or more schedule functions in meta schedule with sampling
+instructions. The execution traces generated by the schedule functions are the design space to be
+explored.
+
+**AutoScheduler (Ansor, Template-free tuning)**. As mentioned in the previous section, application
+of composite schedule rules generates the design space, which is equivalent to Ansor’s sketch
+generation.
+
+**Mixing styles in design space definition**. By taking union of the spaces induced by the three
+special cases, our system allows developers to combine generic rules that Ansor provides and
+operator-specific scheduling.
+
+## 4. Reference-level explanation
+
+In this section, we introduce the underlying techniques for the automation system to extract and
+explore the design space. The figure below briefly illustrates the workflow of the system:
+
+![meta-schedule-workflow](../resources/meta-schedule-workflow.png)
+
+### 4.1. Execution trace as the design space
+
+**Trace**. To represent the design space defined by the meta schedule DSL, the underlying system
+records all the instructions users applied to the schedule class, including sampling and schedule
+primitives. We call this list of instructions a trace.
+
+For instance, executing the example above results in the following trace:
+
+```
+Instruction 0. Sample-Perfect-Tile
+Instruction 1. Sample-Perfect-Tile
+Instruction 2. Sample-Perfect-Tile
+Instruction 3. Split
+Instruction 4. Split
+Instruction 5. Split
+Instruction 6. Reorder
+```
+
+**Trace forms design space.** A trace may contain zero or more sampling instructions, which
+introduces the uncertainty in scheduling - one instance of sampling results in one point in the
+design space. Therefore, the trace itself forms a design space to explore, e.g. which set of tile
+sizes works best on a specific hardware.
+
+**Union of design space**. Our system works on a set of traces, representing the union of the design
+spaces represented by every single trace.
+
+**Fork a trace**. When two different decisions in the scheduling process are equally important to
+generate high-performance schedules, we allow forking the trace into two, and the design space is
+the union of the forked traces.
+
+The trace is not strictly user-facing, but can be accessed and printed with the following syntax:
+
+```python
+# requires to trace the execution
+sch = tir.Schedule(..., traced=True)
+# do a lot of scheduling
+...
+# print the trace
+print(sch.trace)
+```
+
+And below is an example of the printed trace, which honestly reflects the schedule as a snippet of
+python scheduling function:
+
+```python
+b0 = sch.get_block(name="matmul", func_name="main")
+l1, l2, l3 = sch.get_loops(block=b0)
+v4, v5, v6, v7 = sch.sample_perfect_tile(loop=l1, n=4, max_innermost_factor=16, decision=[32, 1, 16, 2])
+v8, v9, v10, v11 = sch.sample_perfect_tile(loop=l2, n=4, max_innermost_factor=16, decision=[64, 4, 2, 2])
+v12, v13 = sch.sample_perfect_tile(loop=l3, n=2, max_innermost_factor=16, decision=[64, 16])
+l14, l15, l16, l17 = sch.split(loop=l1, factors=[v4, v5, v6, v7])
+l18, l19, l20, l21 = sch.split(loop=l2, factors=[v8, v9, v10, v11])
+l22, l23 = sch.split(loop=l3, factors=[v12, v13])
+sch.reorder(l14, l18, l15, l19, l22, l16, l20, l23, l17, l21)
+```
+
+### 4.2. Exploring the Design Space
+
+Meta Schedule provides several built-in exploration strategies to exhaustively or efficiently search
+for efficient schedules.
+
+**Random search by replaying schedule functions.** With a user-provided schedule function
+as a black-box design space generator, our system could repetitively invoke such an opaque function
+without doing any extra analysis. The function could be written in C++ or Python, or any language
+that implements packed function FFI. If sampling instructions are present in the function, then each
+invocation results in a different IRModule after being scheduled because the random decisions are

Review comment:
       Thanks Andrew! Yeah it looks better.
   
   > shouldn't this process export a log of the random variables chosen to fill the trace, so that the IRModule then is not so precious
   
   Good question! Yes, we do record the decisions made at each instruction on the trace. BTW, it is actually possible for users to force decision making on the sampling instructions, e.g.:
   
   ```python
   sch.sample_perfect_tile(loop, n=4, decisions=[4, 4, 8, 32])
   ```
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tvm-rfcs] junrushao1994 commented on a change in pull request #5: [RFC] Meta Schedule (AutoTIR)

Posted by GitBox <gi...@apache.org>.
junrushao1994 commented on a change in pull request #5:
URL: https://github.com/apache/tvm-rfcs/pull/5#discussion_r664835966



##########
File path: rfcs/0001-meta-schedule-autotensorir.md
##########
@@ -0,0 +1,339 @@
+<!--- Licensed to the Apache Software Foundation (ASF) under one -->
+<!--- or more contributor license agreements.  See the NOTICE file -->
+<!--- distributed with this work for additional information -->
+<!--- regarding copyright ownership.  The ASF licenses this file -->
+<!--- to you under the Apache License, Version 2.0 (the -->
+<!--- "License"); you may not use this file except in compliance -->
+<!--- with the License.  You may obtain a copy of the License at -->
+
+<!---   http://www.apache.org/licenses/LICENSE-2.0 -->
+
+<!--- Unless required by applicable law or agreed to in writing, -->
+<!--- software distributed under the License is distributed on an -->
+<!--- "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY -->
+<!--- KIND, either express or implied.  See the License for the -->
+<!--- specific language governing permissions and limitations -->
+<!--- under the License. -->
+
+* Feature Name: Meta Schedule (AutoTIR)
+* Start Date: 2021-05-28
+* RFC PR: TBD (apache/tvm-rfcs#0000)
+* GitHub Issue: TBD (apache/tvm-rfcs#0000)
+
+## 1. Summary
+
+This proposal introduces Meta Schedule: a probabilistic scheduling DSL on TIR that unifies the approaches of AutoTVM and Auto Scheduler (Ansor). Meta schedule provides a pragmatic way to define the space of automatic tuning, extensibility in terms of all possible TIR schedule primitives like tensorization and loop partitioning, and customizability on every layer of the automation system.
+
+Meta Schedule is our 3rd generation automatic scheduling system.
+
+## 2. Motivation
+
+**Scheduling and Design Space**
+
+In TVM TensorIR, optimization of a TensorIR program is done via a sequence of transformations. For example, we reorder loops for better locality and we tensorize for specific hardware intrinsics. The process of invoking such a set of pre-defined transformations is called "**scheduling**", and each transformation is called a "**schedule primitive**". These primitives form a domain-specific language (DSL) describing the transformation of TensorIR programs. **Design space** is the set of all possible schedulings with respect to a TensorIR program.
+
+**Problems with the Current Scheduling System**
+
+* **Manual schedule**: Developers optimize their programs by manually invoking schedule primitives, i.e. explore points in the design space with humans in the loop. This can be a tedious and error-prone approach, hence the creation of AutoTVM and AutoScheduler (Ansor).
+* **AutoTVM**: The automation system requires users to define "schedule templates" as the design space for each operator. Therefore, it is inextensible to hundreds of operators and variety hardware platforms.
+* **AutoScheduler (Ansor)**: It automatically generates schedule templates as the design space, according to a set of predefined "search rules". However, it is non-trivial to extend AutoScheduler to new schedule primitives (tensorize, loop partition, software pipelining, etc).
+* The three systems above have isolated sets of APIs with several layers of their own abstraction, which are not only hard to learn, but also engineering-intensive to customize.
+
+**Benefit of Meta Schedule**
+
+* Succinct syntax, consistent APIs to TensorIR schedule with no other layer of abstraction.
+* Provides unified APIs for implementing manual schedule, AutoTVM and AutoScheduler (Ansor).
+* Extensibility to all the schedule primitives, including tensorization and loop partitioning. Almost no extra effort is needed to use a new primitive in auto-tuning.
+* The automation infrastructure is extensible on every of its components. Every component of the system can be customized easily in pure python or C++ or both. For example, one can develop a new design space generator in python, a new ProgramRunner in python, etc.
+
+
+## 3. Guide-level explanation
+
+In this section, we describe the syntax of meta schedule DSL, and how it could be used to describe and auto-generate the design space.
+
+### 3.1. Manual Schedule
+
+Meta schedule APIs are almost the same as TE or TensorIR scheduling. Here is an example of a manual schedule for matrix multiplication:
+
+```python
+# Designate a set of tile sizes
+i_tiles = [16, 8, 8, 8]
+j_tiles = [16, 8, 8, 8]
+k_tiles = [256, 8]
+
+# Tile the loops according to the tile sizes
+i_0, i_1, i_2, i_3 = sch.split(loop=i, factors=i_tiles)
+j_0, j_1, j_2, j_3 = sch.split(loop=j, factors=j_tiles)
+k_0, k_1           = sch.split(loop=k, factors=k_tiles)
+
+# Organize the loops into "SSRSRS" 6-level tiles
+sch.reorder(
+    i_0, j_0, # S: the 1st spatial tile
+    i_1, j_1, # S: the 2nd spatial tile
+    k_0,      # R: the 1st reduction tile
+    i_2, j_2, # S: the 3rd spatial tile
+    k_1,      # R: the 2nd reduction tile
+    i_3, j_3, # S: the 4th spatial tile
+)
+```
+
+In this example, the developers may tweak the tile sizes and measure the performance of the generated kernels to explore the opportunities of potential optimization.
+
+Generally speaking, while writing a schedule, there are often some parameters that are hard to determine ahead of time, for example, tile sizes, unroll steps, or which tensor intrinsics to use. Developers may manually enumerate possible combinations of these unknown factors, and then pick the best schedule according to measurement results on their device.
+
+### 3.2. Composite Schedule Rules
+
+As introduced in the previous section, in TensorIR, each schedule primitive handles only a very basic transformation of the IR, for example, `split` only splits a loop into two. In the real world, the over-fine granularity of those primitives usually leads to repetitive and verbose scheduling code, as [mentioned](https://discuss.tvm.apache.org/t/rfc-tensorir-a-schedulable-ir-for-tvm/7872/43?u=junrushao1994) by developers in our community.
+
+To make it more convenient and modular, we allow users to register "composite schedules" that apply a sequence of schedule primitives according to certain analysis of the IR. For instance, a composite schedule may inspect a TensorIR block and decide whether we should call `compute_inline` on it.
+
+### 3.3. AutoTVM-style Design Space Description
+
+Meta schedule extends the schedule DSL with sampling instructions. When included in a schedule, these instructions parametrize the schedule from a single deterministic point to a space supported by random variables (tile size, etc.), making it possible for developers to describe the design space with meta schedule APIs.
+
+We can extend the matmul example above to cover all possible tilings using these sampling instructions:
+
+```python
+# Sample tile sizes
+i_tiles = sch.sample_perfect_tile(i, n=4)
+j_tiles = sch.sample_perfect_tile(j, n=4)
+k_tiles = sch.sample_perfect_tile(k, n=2)
+# Tile the loops according to the random variables
+i_0, i_1, i_2, i_3 = sch.split(loop=i, factors=i_tiles)
+j_0, j_1, j_2, j_3 = sch.split(loop=j, factors=j_tiles)
+k_0, k_1           = sch.split(loop=k, factors=k_tiles)
+# Organize the loops into "SSRSRS" 6-level tiles
+sch.reorder(
+    i_0, j_0, # S: the 1st spatial tile
+    i_1, j_1, # S: the 2nd spatial tile
+    k_0,      # R: the 1st reduction tile
+    i_2, j_2, # S: the 3rd spatial tile
+    k_1,      # R: the 2nd reduction tile
+    i_3, j_3, # S: the 4th spatial tile
+)
+```
+
+### 3.4. AutoScheduler-style Design Space Generation
+
+AutoScheduler (Ansor) generates schedule templates by applying their SearchRules to each stage. SearchRule analyzes TE and eagerly trigger schedule primitives accordingly in its internally maintained mini IR.
+
+As introduced in Section 3.2, composite schedule rules are equivalent to AutoScheduler's SearchRule in TensorIR scheduling. To further generate a design space for scheduling, sampling instructions are used in composite schedule rules. Similarly, the sketch generation phase in AutoScheduler is equivalent to applying composite schedule rules to each block in TensorIR.
+
+Several built-in composite schedule rules are shipped with our system to align with Ansor's design space:
+
+* Multi-level tiling
+* Inline pure spatial blocks
+* Parallelize & vectorize & unroll
+* Auto tensorize
+* ...
+
+Developers may implement their own rules in either Python or C++, and specify which rules to use with the following syntax:
+
+```python
+from tvm import meta_schedule as ms
+
+design_space_generator = ms.PostOrderApply(rules=[
+    ms.MultiLevelTiling(...),
+    CustomRule(...),
+    ms.OtherBuiltinRules(...),
+])
+
+```
+
+### 3.5. Unifying manual schedule / AutoTVM / Ansor
+
+In this section, we show that the design space induced by TE manual schedule, AutoTVM and Ansor are all subsets of meta schedule, and meta schedule further allows mixing those three styles to search jointly.
+
+**Manual schedule**. The TE schedule is a special case of a meta schedule program, where there is no randomness introduced by sampling instructions. It is a single point in terms of design space.
+
+**AutoTVM (Template-based tuning)**. It is more natural representation of AutoTVM’s schedule templates (knobs) by writing one or more schedule functions in meta schedule with sampling instructions. The execution traces generated by the schedule functions are the design space to be explored.
+
+**AutoScheduler (Ansor, Template-free tuning)**. As mentioned in the previous section, application of composite schedule rules generates the design space, which is equivalent to Ansor’s sketch generation.
+
+**Mixing styles in design space definition**. By taking union of the spaces induced by the three special cases, our system allows developers to combine generic rules that Ansor provides and operator-specific scheduling.
+
+## 4. Reference-level explanation
+
+In this section, we introduce the underlying techniques for the automation system to extract and explore the design space. The figure below briefly illustrates the workflow of the system:
+
+![meta-schedule-workflow](../resources/meta-schedule-workflow.png)
+
+### 4.1. Execution trace as the design space
+
+**Trace**. To represent the design space defined by the meta schedule DSL, the underlying system records all the instructions users applied to the schedule class, including sampling and schedule primitives. We call this list of instructions a trace.
+
+Executing the example above results in the following trace:
+
+```
+Instruction 0. Sample-Perfect-Tile
+Instruction 1. Sample-Perfect-Tile
+Instruction 2. Sample-Perfect-Tile
+Instruction 3. Split
+Instruction 4. Split
+Instruction 5. Split
+Instruction 6. Reorder
+```
+
+**Trace forms design space.** A trace may contain zero or more sampling instructions, which introduces the uncertainty in scheduling - one instance of sampling results in one point in the design space. Therefore, the trace itself forms a design space to explore, e.g. which set of tile sizes works best on a specific hardware.
+
+**Union of design space**. Our system works on a set of traces, representing the union of the design spaces represented by every single trace.
+
+**Fork a trace**. When two different decisions in the scheduling process are equally important to generate high-performance schedules, we allow forking the trace into two, and the design space is the union of the forked traces.
+
+The trace is not strictly user-facing, but can be accessed and printed with the following syntax:
+
+```python
+# requires to trace the execution
+sch = tir.Schedule(..., traced=True)
+# do a lot of scheduling
+...
+# print the trace
+print(sch.trace)
+```
+
+And below is an example of the printed trace, which honestly reflects the schedule as a snippet of python scheduling function:
+
+```python
+b0 = sch.get_block(name="matmul", func_name="main")
+l1, l2, l3 = sch.get_loops(block=b0)
+v4, v5, v6, v7 = sch.sample_perfect_tile(loop=l1, n=4, max_innermost_factor=16, decision=[32, 1, 16, 2])
+v8, v9, v10, v11 = sch.sample_perfect_tile(loop=l2, n=4, max_innermost_factor=16, decision=[64, 4, 2, 2])
+v12, v13 = sch.sample_perfect_tile(loop=l3, n=2, max_innermost_factor=16, decision=[64, 16])
+l14, l15, l16, l17 = sch.split(loop=l1, factors=[v4, v5, v6, v7])
+l18, l19, l20, l21 = sch.split(loop=l2, factors=[v8, v9, v10, v11])
+l22, l23 = sch.split(loop=l3, factors=[v12, v13])
+sch.reorder(l14, l18, l15, l19, l22, l16, l20, l23, l17, l21)
+```
+
+### 4.2. Exploring the Search Space
+
+Meta Schedule provides several built-in exploration strategies to exhaustively or efficiently search for efficient schedules.
+
+**Program replay**. A simple strategy that replays the user-provided schedule function without using or taking any advantage of trace.

Review comment:
       Not necessarily. User could provide a schedule function that still involve randomness, e.g.
   
   ```
   def sch_fn(sch):
     ... = sch.sample_perfect_tile(...)
   ```
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tvm-rfcs] junrushao1994 commented on a change in pull request #5: [RFC] Meta Schedule (AutoTIR)

Posted by GitBox <gi...@apache.org>.
junrushao1994 commented on a change in pull request #5:
URL: https://github.com/apache/tvm-rfcs/pull/5#discussion_r668387313



##########
File path: rfcs/0001-meta-schedule-autotensorir.md
##########
@@ -0,0 +1,444 @@
+<!--- Licensed to the Apache Software Foundation (ASF) under one -->
+<!--- or more contributor license agreements.  See the NOTICE file -->
+<!--- distributed with this work for additional information -->
+<!--- regarding copyright ownership.  The ASF licenses this file -->
+<!--- to you under the Apache License, Version 2.0 (the -->
+<!--- "License"); you may not use this file except in compliance -->
+<!--- with the License.  You may obtain a copy of the License at -->
+
+<!---   http://www.apache.org/licenses/LICENSE-2.0 -->
+
+<!--- Unless required by applicable law or agreed to in writing, -->
+<!--- software distributed under the License is distributed on an -->
+<!--- "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY -->
+<!--- KIND, either express or implied.  See the License for the -->
+<!--- specific language governing permissions and limitations -->
+<!--- under the License. -->
+
+* Feature Name: Meta Schedule (Formerly AutoTIR)
+* Start Date: 2021-05-28
+* RFC PR: TBD (apache/tvm-rfcs#0000)
+* GitHub Issue: TBD (apache/tvm-rfcs#0000)
+
+## 1. Summary
+
+This proposal introduces Meta Schedule: a probabilistic scheduling DSL on TIR that unifies the
+approaches of AutoTVM and Auto Scheduler (Ansor). Meta schedule provides a pragmatic way to define
+the space of automatic tuning, extensibility in terms of all possible TIR schedule primitives like
+tensorization and loop partitioning, and customizability on every layer of the automation system.
+
+Meta Schedule is our 3rd generation automatic scheduling system.
+
+## 2. Motivation
+
+**Scheduling and design space.** In TVM TensorIR, optimization of a TensorIR program is done via a
+sequence of transformations. For example, we reorder loops for better locality and we tensorize for
+specific hardware intrinsics. The process of invoking such a set of pre-defined transformations is
+called "**scheduling**", and each transformation is called a "**schedule primitive**". These
+primitives form a domain-specific language (DSL) describing the transformation of TensorIR programs.
+**Design space** is the set of all possible schedulings with respect to a TensorIR program.
+
+**Problems with the current scheduling system.** Currently we have 3 sets of scheduling APIs:
+* **Manual schedule**: Developers optimize their programs by manually invoking schedule primitives,
+  i.e. explore points in the design space with humans in the loop. This can be a tedious and
+  error-prone approach, hence the creation of AutoTVM and AutoScheduler (Ansor).
+* **AutoTVM**: The automation system requires users to define "schedule templates" as the design
+  space for each operator. Therefore, it is inextensible to hundreds of operators and variety
+  hardware platforms.
+* **AutoScheduler (Ansor)**: It automatically generates schedule templates as the design space,
+  according to a set of predefined "search rules". However, it is non-trivial to extend
+  AutoScheduler to new schedule primitives (tensorize, loop partition, software pipelining, etc).
+* The three systems above have isolated sets of APIs with several layers of their own abstraction,
+  which are not only hard to learn, but also engineering-intensive to customize.
+
+**Benefits of Meta Schedule.**  Meta schedule provides:
+* Succinct syntax, consistent APIs to TensorIR schedule with no other layer of abstraction.
+* Unified APIs for implementing manual schedules, AutoTVM-style schedules, and AutoScheduler-style
+  schedules.
+* Extensibility of all the schedule primitives, including tensorization and loop partitioning.
+  Almost no extra effort is needed to use a new primitive in auto-tuning.
+* The automation infrastructure is extensible on every of its components. Every component of the
+  system can be customized easily in pure python or C++ or both. For example, one can develop a new
+  design space generator in python, a new ProgramRunner in python, etc.
+
+
+## 3. Guide-level explanation
+
+In this section, we describe the syntax of meta schedule DSL, and how it could be used to describe
+and auto-generate the design space.
+
+### 3.1. Manual Schedule
+
+Meta schedule APIs are almost the same as TE or TensorIR scheduling. Here is an example of a manual
+schedule for matrix multiplication:
+
+```python
+# Designate a set of tile sizes
+i_tiles = [16, 8, 8, 8]
+j_tiles = [16, 8, 8, 8]
+k_tiles = [256, 8]
+
+# Tile the loops according to the tile sizes
+i_0, i_1, i_2, i_3 = sch.split(loop=i, factors=i_tiles)
+j_0, j_1, j_2, j_3 = sch.split(loop=j, factors=j_tiles)
+k_0, k_1           = sch.split(loop=k, factors=k_tiles)
+
+# Organize the loops into "SSRSRS" 6-level tiles
+sch.reorder(
+    i_0, j_0, # S: the 1st spatial tile
+    i_1, j_1, # S: the 2nd spatial tile
+    k_0,      # R: the 1st reduction tile
+    i_2, j_2, # S: the 3rd spatial tile
+    k_1,      # R: the 2nd reduction tile
+    i_3, j_3, # S: the 4th spatial tile
+)
+```
+
+In this example, the developers may tweak the tile sizes and measure the performance of the
+generated kernels to explore the opportunities of potential optimization.
+
+Generally speaking, while writing a schedule, there are often some parameters that are hard to
+determine ahead of time, for example, tile sizes, unroll steps, or which tensor intrinsics to use.
+Developers may manually enumerate possible combinations of these unknown factors, and then pick the
+best schedule according to measurement results on their device.
+
+### 3.2. Composite Schedule Rules
+
+As introduced in the previous section, in TensorIR, each schedule primitive handles only a very
+basic transformation of the IR. For example, `split` only splits a loop into two new loops. In the
+real world, the over-fine granularity of those primitives usually leads to repetitive and verbose
+scheduling code, as
+[mentioned](https://discuss.tvm.apache.org/t/rfc-tensorir-a-schedulable-ir-for-tvm/7872/43?u=junrushao1994)
+by developers in our community.
+
+To make it more convenient and modular, we allow users to register "composite schedules" that apply
+a sequence of schedule primitives according to certain analysis of the IR. For instance, a composite
+schedule may inspect a TensorIR block and decide whether we should call `compute_inline` on it.
+
+### 3.3. AutoTVM-style Design Space Description
+
+Meta schedule extends the schedule DSL with sampling instructions. When included in a schedule,
+these instructions parametrize the schedule from a single deterministic point to a space supported
+by random variables (tile size, etc.), making it possible for developers to describe the design
+space with meta schedule APIs.
+
+We can extend the matmul example above to cover all possible tilings using these sampling
+instructions:
+
+```python
+# Sample tile sizes
+i_tiles = sch.sample_perfect_tile(i, n=4)
+j_tiles = sch.sample_perfect_tile(j, n=4)
+k_tiles = sch.sample_perfect_tile(k, n=2)
+# Tile the loops according to the random variables
+i_0, i_1, i_2, i_3 = sch.split(loop=i, factors=i_tiles)
+j_0, j_1, j_2, j_3 = sch.split(loop=j, factors=j_tiles)
+k_0, k_1           = sch.split(loop=k, factors=k_tiles)
+# Organize the loops into "SSRSRS" 6-level tiles
+sch.reorder(
+    i_0, j_0, # S: the 1st spatial tile
+    i_1, j_1, # S: the 2nd spatial tile
+    k_0,      # R: the 1st reduction tile
+    i_2, j_2, # S: the 3rd spatial tile
+    k_1,      # R: the 2nd reduction tile
+    i_3, j_3, # S: the 4th spatial tile
+)
+```
+
+### 3.4. AutoScheduler-style Design Space Generation
+
+AutoScheduler (Ansor) generates schedule templates by applying their SearchRules to each stage.
+SearchRule analyzes TE and eagerly trigger schedule primitives accordingly in its internally
+maintained mini IR.
+
+As introduced in Section 3.2, composite schedule rules are equivalent to AutoScheduler's SearchRule
+in TensorIR scheduling. To further generate a design space for scheduling, sampling instructions are
+used in composite schedule rules. Similarly, the sketch generation phase in AutoScheduler is
+equivalent to applying composite schedule rules to each block in TensorIR.
+
+Several built-in composite schedule rules are shipped with our system to align with Ansor's design
+space:
+
+* Multi-level tiling
+* Inline pure spatial blocks
+* Parallelize & vectorize & unroll
+* Auto tensorize
+
+Developers may implement their own rules in either Python or C++. They may specify which rules to
+use with the following syntax:
+
+```python
+from tvm import meta_schedule as ms
+
+design_space_generator = ms.PostOrderApply(rules=[
+    ms.MultiLevelTiling(...),
+    CustomRule(...),
+    ms.OtherBuiltinRules(...),
+])
+
+```
+
+### 3.5. Unifying manual schedule / AutoTVM / Ansor
+
+In this section, we show that the design space induced by TE manual schedule, AutoTVM and Ansor are
+all subsets of meta schedule, and meta schedule further allows mixing those three styles to search
+jointly.
+
+**Manual schedule**. The TE schedule is a special case of a meta schedule program, where there is no
+randomness introduced by sampling instructions. It is a single point in terms of design space.
+
+**AutoTVM (Template-based tuning)**. It is more natural representation of AutoTVM’s schedule
+templates (knobs) by writing one or more schedule functions in meta schedule with sampling
+instructions. The execution traces generated by the schedule functions are the design space to be
+explored.
+
+**AutoScheduler (Ansor, Template-free tuning)**. As mentioned in the previous section, application
+of composite schedule rules generates the design space, which is equivalent to Ansor’s sketch
+generation.
+
+**Mixing styles in design space definition**. By taking union of the spaces induced by the three
+special cases, our system allows developers to combine generic rules that Ansor provides and
+operator-specific scheduling.
+
+## 4. Reference-level explanation
+
+In this section, we introduce the underlying techniques for the automation system to extract and
+explore the design space. The figure below briefly illustrates the workflow of the system:
+
+![meta-schedule-workflow](../resources/meta-schedule-workflow.png)
+
+### 4.1. Execution trace as the design space
+
+**Trace**. To represent the design space defined by the meta schedule DSL, the underlying system
+records all the instructions users applied to the schedule class, including sampling and schedule
+primitives. We call this list of instructions a trace.
+
+For instance, executing the example above results in the following trace:
+
+```
+Instruction 0. Sample-Perfect-Tile
+Instruction 1. Sample-Perfect-Tile
+Instruction 2. Sample-Perfect-Tile
+Instruction 3. Split
+Instruction 4. Split
+Instruction 5. Split
+Instruction 6. Reorder
+```
+
+**Trace forms design space.** A trace may contain zero or more sampling instructions, which
+introduces the uncertainty in scheduling - one instance of sampling results in one point in the
+design space. Therefore, the trace itself forms a design space to explore, e.g. which set of tile
+sizes works best on a specific hardware.
+
+**Union of design space**. Our system works on a set of traces, representing the union of the design
+spaces represented by every single trace.
+
+**Fork a trace**. When two different decisions in the scheduling process are equally important to
+generate high-performance schedules, we allow forking the trace into two, and the design space is
+the union of the forked traces.
+
+The trace is not strictly user-facing, but can be accessed and printed with the following syntax:
+
+```python
+# requires to trace the execution
+sch = tir.Schedule(..., traced=True)
+# do a lot of scheduling
+...
+# print the trace
+print(sch.trace)
+```
+
+And below is an example of the printed trace, which honestly reflects the schedule as a snippet of
+python scheduling function:
+
+```python
+b0 = sch.get_block(name="matmul", func_name="main")
+l1, l2, l3 = sch.get_loops(block=b0)

Review comment:
       It is a good question. Right now I haven't put too much thought on this, but just name those variables with "absolutely correct names". We can definitely improve the ReprPrinter in the future with some heuristics




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tvm-rfcs] junrushao1994 edited a comment on pull request #5: [RFC] Meta Schedule (AutoTIR)

Posted by GitBox <gi...@apache.org>.
junrushao1994 edited a comment on pull request #5:
URL: https://github.com/apache/tvm-rfcs/pull/5#issuecomment-875121662


   @tkonolige thanks for the review!
   
   > The biggest one I can think of is the fact that we will have to rewrite all our existing schedules to take advantage of the new infrastructure.
   
   Given in most of the time we use sketch generation (in Ansor's terminology) to generate schedules automatically, we can just remove most of the schedules written in TE. Alternatively, we do need to rewrite all of the Ansor's sketch rules, including (defined in `src/auto_scheduler/search_policy/sketch_policy_rules.h`):
   - Always-Inline
   - Multi-Level-Tiling
   - Multi-Level-Tiling-with-Fusion
   - Add-Cache-Read
   - Add-Cache-Write
   - Add-RFactor
   - Simplify-Compute-with-Const-Tensor
   - Cross-Thread-Reduction
   - Special-Compute-Location
   
   > Also, will tuning be slower if we allow users to define their own search rules?
   
   The search rule is only executed once to obtain the search space before we explore it, and it is usually fairly fast (within a second), so If we only customize our own search rule (i.e. sketch rule in Ansor, schedule rule in meta schedule), we won't observe performance degradation
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tvm-rfcs] junrushao1994 commented on a change in pull request #5: [RFC] Meta Schedule (AutoTIR)

Posted by GitBox <gi...@apache.org>.
junrushao1994 commented on a change in pull request #5:
URL: https://github.com/apache/tvm-rfcs/pull/5#discussion_r676821034



##########
File path: rfcs/0001-meta-schedule-autotensorir.md
##########
@@ -64,8 +76,17 @@ primitives form a domain-specific language (DSL) describing the transformation o
 
 ## 3. Guide-level explanation
 
-In this section, we describe the syntax of meta schedule DSL, and how it could be used to describe
-and auto-generate the design space.
+Meta Schedule DSL is a flexible way to define or auto-generate the design space.
+
+This section introduces its syntax of meta schedule DSL and usage in terms of describing and
+auto-generating the design space, more specifically, its APIs for:
+1) Manually constructing a schedule using existing schedule primitives (Section 3.1);
+2) Defining composite schedule to simplify the ap sequence of schedule primitives (Section 3.2);
+3) Describing a design space of possible schedules,
+a.k.a. AutoTVM-style schedule templates (Section 3.3);
+4) Automatically generating the design space, a.k.a. Ansor-style search rules (Section 3.4);

Review comment:
       Done. Just replace all the occurrences of Ansor to AutoScheduler :-)




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tvm-rfcs] junrushao1994 commented on a change in pull request #5: [RFC] Meta Schedule (AutoTIR)

Posted by GitBox <gi...@apache.org>.
junrushao1994 commented on a change in pull request #5:
URL: https://github.com/apache/tvm-rfcs/pull/5#discussion_r668311883



##########
File path: rfcs/0001-meta-schedule-autotensorir.md
##########
@@ -0,0 +1,444 @@
+<!--- Licensed to the Apache Software Foundation (ASF) under one -->
+<!--- or more contributor license agreements.  See the NOTICE file -->
+<!--- distributed with this work for additional information -->
+<!--- regarding copyright ownership.  The ASF licenses this file -->
+<!--- to you under the Apache License, Version 2.0 (the -->
+<!--- "License"); you may not use this file except in compliance -->
+<!--- with the License.  You may obtain a copy of the License at -->
+
+<!---   http://www.apache.org/licenses/LICENSE-2.0 -->
+
+<!--- Unless required by applicable law or agreed to in writing, -->
+<!--- software distributed under the License is distributed on an -->
+<!--- "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY -->
+<!--- KIND, either express or implied.  See the License for the -->
+<!--- specific language governing permissions and limitations -->
+<!--- under the License. -->
+
+* Feature Name: Meta Schedule (Formerly AutoTIR)
+* Start Date: 2021-05-28
+* RFC PR: TBD (apache/tvm-rfcs#0000)
+* GitHub Issue: TBD (apache/tvm-rfcs#0000)
+
+## 1. Summary
+
+This proposal introduces Meta Schedule: a probabilistic scheduling DSL on TIR that unifies the
+approaches of AutoTVM and Auto Scheduler (Ansor). Meta schedule provides a pragmatic way to define
+the space of automatic tuning, extensibility in terms of all possible TIR schedule primitives like
+tensorization and loop partitioning, and customizability on every layer of the automation system.
+
+Meta Schedule is our 3rd generation automatic scheduling system.
+
+## 2. Motivation
+
+**Scheduling and design space.** In TVM TensorIR, optimization of a TensorIR program is done via a
+sequence of transformations. For example, we reorder loops for better locality and we tensorize for
+specific hardware intrinsics. The process of invoking such a set of pre-defined transformations is
+called "**scheduling**", and each transformation is called a "**schedule primitive**". These
+primitives form a domain-specific language (DSL) describing the transformation of TensorIR programs.
+**Design space** is the set of all possible schedulings with respect to a TensorIR program.
+
+**Problems with the current scheduling system.** Currently we have 3 sets of scheduling APIs:
+* **Manual schedule**: Developers optimize their programs by manually invoking schedule primitives,
+  i.e. explore points in the design space with humans in the loop. This can be a tedious and
+  error-prone approach, hence the creation of AutoTVM and AutoScheduler (Ansor).
+* **AutoTVM**: The automation system requires users to define "schedule templates" as the design
+  space for each operator. Therefore, it is inextensible to hundreds of operators and variety
+  hardware platforms.
+* **AutoScheduler (Ansor)**: It automatically generates schedule templates as the design space,
+  according to a set of predefined "search rules". However, it is non-trivial to extend
+  AutoScheduler to new schedule primitives (tensorize, loop partition, software pipelining, etc).
+* The three systems above have isolated sets of APIs with several layers of their own abstraction,
+  which are not only hard to learn, but also engineering-intensive to customize.
+
+**Benefits of Meta Schedule.**  Meta schedule provides:
+* Succinct syntax, consistent APIs to TensorIR schedule with no other layer of abstraction.
+* Unified APIs for implementing manual schedules, AutoTVM-style schedules, and AutoScheduler-style
+  schedules.
+* Extensibility of all the schedule primitives, including tensorization and loop partitioning.
+  Almost no extra effort is needed to use a new primitive in auto-tuning.
+* The automation infrastructure is extensible on every of its components. Every component of the
+  system can be customized easily in pure python or C++ or both. For example, one can develop a new
+  design space generator in python, a new ProgramRunner in python, etc.
+
+
+## 3. Guide-level explanation
+
+In this section, we describe the syntax of meta schedule DSL, and how it could be used to describe
+and auto-generate the design space.
+
+### 3.1. Manual Schedule
+
+Meta schedule APIs are almost the same as TE or TensorIR scheduling. Here is an example of a manual
+schedule for matrix multiplication:
+
+```python
+# Designate a set of tile sizes
+i_tiles = [16, 8, 8, 8]
+j_tiles = [16, 8, 8, 8]
+k_tiles = [256, 8]
+
+# Tile the loops according to the tile sizes
+i_0, i_1, i_2, i_3 = sch.split(loop=i, factors=i_tiles)
+j_0, j_1, j_2, j_3 = sch.split(loop=j, factors=j_tiles)
+k_0, k_1           = sch.split(loop=k, factors=k_tiles)
+
+# Organize the loops into "SSRSRS" 6-level tiles
+sch.reorder(
+    i_0, j_0, # S: the 1st spatial tile
+    i_1, j_1, # S: the 2nd spatial tile
+    k_0,      # R: the 1st reduction tile
+    i_2, j_2, # S: the 3rd spatial tile
+    k_1,      # R: the 2nd reduction tile
+    i_3, j_3, # S: the 4th spatial tile
+)
+```
+
+In this example, the developers may tweak the tile sizes and measure the performance of the
+generated kernels to explore the opportunities of potential optimization.
+
+Generally speaking, while writing a schedule, there are often some parameters that are hard to
+determine ahead of time, for example, tile sizes, unroll steps, or which tensor intrinsics to use.
+Developers may manually enumerate possible combinations of these unknown factors, and then pick the
+best schedule according to measurement results on their device.
+
+### 3.2. Composite Schedule Rules
+
+As introduced in the previous section, in TensorIR, each schedule primitive handles only a very
+basic transformation of the IR. For example, `split` only splits a loop into two new loops. In the
+real world, the over-fine granularity of those primitives usually leads to repetitive and verbose
+scheduling code, as
+[mentioned](https://discuss.tvm.apache.org/t/rfc-tensorir-a-schedulable-ir-for-tvm/7872/43?u=junrushao1994)
+by developers in our community.
+
+To make it more convenient and modular, we allow users to register "composite schedules" that apply
+a sequence of schedule primitives according to certain analysis of the IR. For instance, a composite
+schedule may inspect a TensorIR block and decide whether we should call `compute_inline` on it.
+
+### 3.3. AutoTVM-style Design Space Description
+
+Meta schedule extends the schedule DSL with sampling instructions. When included in a schedule,
+these instructions parametrize the schedule from a single deterministic point to a space supported
+by random variables (tile size, etc.), making it possible for developers to describe the design
+space with meta schedule APIs.
+
+We can extend the matmul example above to cover all possible tilings using these sampling
+instructions:
+
+```python
+# Sample tile sizes
+i_tiles = sch.sample_perfect_tile(i, n=4)
+j_tiles = sch.sample_perfect_tile(j, n=4)
+k_tiles = sch.sample_perfect_tile(k, n=2)
+# Tile the loops according to the random variables
+i_0, i_1, i_2, i_3 = sch.split(loop=i, factors=i_tiles)
+j_0, j_1, j_2, j_3 = sch.split(loop=j, factors=j_tiles)
+k_0, k_1           = sch.split(loop=k, factors=k_tiles)
+# Organize the loops into "SSRSRS" 6-level tiles
+sch.reorder(
+    i_0, j_0, # S: the 1st spatial tile
+    i_1, j_1, # S: the 2nd spatial tile
+    k_0,      # R: the 1st reduction tile
+    i_2, j_2, # S: the 3rd spatial tile
+    k_1,      # R: the 2nd reduction tile
+    i_3, j_3, # S: the 4th spatial tile
+)
+```
+
+### 3.4. AutoScheduler-style Design Space Generation
+
+AutoScheduler (Ansor) generates schedule templates by applying their SearchRules to each stage.
+SearchRule analyzes TE and eagerly trigger schedule primitives accordingly in its internally
+maintained mini IR.
+
+As introduced in Section 3.2, composite schedule rules are equivalent to AutoScheduler's SearchRule
+in TensorIR scheduling. To further generate a design space for scheduling, sampling instructions are

Review comment:
       Basically composite schedule rules schedule TensorIR, while AutoScheduler (Ansor) schedules on its own internal mini DSL




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tvm-rfcs] junrushao1994 commented on a change in pull request #5: [RFC] Meta Schedule (AutoTIR)

Posted by GitBox <gi...@apache.org>.
junrushao1994 commented on a change in pull request #5:
URL: https://github.com/apache/tvm-rfcs/pull/5#discussion_r668319732



##########
File path: rfcs/0001-meta-schedule-autotensorir.md
##########
@@ -0,0 +1,444 @@
+<!--- Licensed to the Apache Software Foundation (ASF) under one -->
+<!--- or more contributor license agreements.  See the NOTICE file -->
+<!--- distributed with this work for additional information -->
+<!--- regarding copyright ownership.  The ASF licenses this file -->
+<!--- to you under the Apache License, Version 2.0 (the -->
+<!--- "License"); you may not use this file except in compliance -->
+<!--- with the License.  You may obtain a copy of the License at -->
+
+<!---   http://www.apache.org/licenses/LICENSE-2.0 -->
+
+<!--- Unless required by applicable law or agreed to in writing, -->
+<!--- software distributed under the License is distributed on an -->
+<!--- "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY -->
+<!--- KIND, either express or implied.  See the License for the -->
+<!--- specific language governing permissions and limitations -->
+<!--- under the License. -->
+
+* Feature Name: Meta Schedule (Formerly AutoTIR)
+* Start Date: 2021-05-28
+* RFC PR: TBD (apache/tvm-rfcs#0000)
+* GitHub Issue: TBD (apache/tvm-rfcs#0000)
+
+## 1. Summary
+
+This proposal introduces Meta Schedule: a probabilistic scheduling DSL on TIR that unifies the
+approaches of AutoTVM and Auto Scheduler (Ansor). Meta schedule provides a pragmatic way to define
+the space of automatic tuning, extensibility in terms of all possible TIR schedule primitives like
+tensorization and loop partitioning, and customizability on every layer of the automation system.
+
+Meta Schedule is our 3rd generation automatic scheduling system.
+
+## 2. Motivation
+
+**Scheduling and design space.** In TVM TensorIR, optimization of a TensorIR program is done via a
+sequence of transformations. For example, we reorder loops for better locality and we tensorize for
+specific hardware intrinsics. The process of invoking such a set of pre-defined transformations is
+called "**scheduling**", and each transformation is called a "**schedule primitive**". These
+primitives form a domain-specific language (DSL) describing the transformation of TensorIR programs.
+**Design space** is the set of all possible schedulings with respect to a TensorIR program.
+
+**Problems with the current scheduling system.** Currently we have 3 sets of scheduling APIs:
+* **Manual schedule**: Developers optimize their programs by manually invoking schedule primitives,
+  i.e. explore points in the design space with humans in the loop. This can be a tedious and
+  error-prone approach, hence the creation of AutoTVM and AutoScheduler (Ansor).
+* **AutoTVM**: The automation system requires users to define "schedule templates" as the design
+  space for each operator. Therefore, it is inextensible to hundreds of operators and variety
+  hardware platforms.
+* **AutoScheduler (Ansor)**: It automatically generates schedule templates as the design space,
+  according to a set of predefined "search rules". However, it is non-trivial to extend
+  AutoScheduler to new schedule primitives (tensorize, loop partition, software pipelining, etc).
+* The three systems above have isolated sets of APIs with several layers of their own abstraction,
+  which are not only hard to learn, but also engineering-intensive to customize.
+
+**Benefits of Meta Schedule.**  Meta schedule provides:
+* Succinct syntax, consistent APIs to TensorIR schedule with no other layer of abstraction.
+* Unified APIs for implementing manual schedules, AutoTVM-style schedules, and AutoScheduler-style
+  schedules.
+* Extensibility of all the schedule primitives, including tensorization and loop partitioning.
+  Almost no extra effort is needed to use a new primitive in auto-tuning.
+* The automation infrastructure is extensible on every of its components. Every component of the
+  system can be customized easily in pure python or C++ or both. For example, one can develop a new
+  design space generator in python, a new ProgramRunner in python, etc.
+
+
+## 3. Guide-level explanation
+
+In this section, we describe the syntax of meta schedule DSL, and how it could be used to describe
+and auto-generate the design space.
+
+### 3.1. Manual Schedule
+
+Meta schedule APIs are almost the same as TE or TensorIR scheduling. Here is an example of a manual
+schedule for matrix multiplication:
+
+```python
+# Designate a set of tile sizes
+i_tiles = [16, 8, 8, 8]
+j_tiles = [16, 8, 8, 8]
+k_tiles = [256, 8]
+
+# Tile the loops according to the tile sizes
+i_0, i_1, i_2, i_3 = sch.split(loop=i, factors=i_tiles)
+j_0, j_1, j_2, j_3 = sch.split(loop=j, factors=j_tiles)
+k_0, k_1           = sch.split(loop=k, factors=k_tiles)
+
+# Organize the loops into "SSRSRS" 6-level tiles
+sch.reorder(
+    i_0, j_0, # S: the 1st spatial tile
+    i_1, j_1, # S: the 2nd spatial tile
+    k_0,      # R: the 1st reduction tile
+    i_2, j_2, # S: the 3rd spatial tile
+    k_1,      # R: the 2nd reduction tile
+    i_3, j_3, # S: the 4th spatial tile
+)
+```
+
+In this example, the developers may tweak the tile sizes and measure the performance of the
+generated kernels to explore the opportunities of potential optimization.
+
+Generally speaking, while writing a schedule, there are often some parameters that are hard to
+determine ahead of time, for example, tile sizes, unroll steps, or which tensor intrinsics to use.
+Developers may manually enumerate possible combinations of these unknown factors, and then pick the
+best schedule according to measurement results on their device.
+
+### 3.2. Composite Schedule Rules
+
+As introduced in the previous section, in TensorIR, each schedule primitive handles only a very
+basic transformation of the IR. For example, `split` only splits a loop into two new loops. In the
+real world, the over-fine granularity of those primitives usually leads to repetitive and verbose
+scheduling code, as
+[mentioned](https://discuss.tvm.apache.org/t/rfc-tensorir-a-schedulable-ir-for-tvm/7872/43?u=junrushao1994)
+by developers in our community.
+
+To make it more convenient and modular, we allow users to register "composite schedules" that apply
+a sequence of schedule primitives according to certain analysis of the IR. For instance, a composite
+schedule may inspect a TensorIR block and decide whether we should call `compute_inline` on it.
+
+### 3.3. AutoTVM-style Design Space Description
+
+Meta schedule extends the schedule DSL with sampling instructions. When included in a schedule,
+these instructions parametrize the schedule from a single deterministic point to a space supported
+by random variables (tile size, etc.), making it possible for developers to describe the design
+space with meta schedule APIs.
+
+We can extend the matmul example above to cover all possible tilings using these sampling
+instructions:
+
+```python
+# Sample tile sizes
+i_tiles = sch.sample_perfect_tile(i, n=4)
+j_tiles = sch.sample_perfect_tile(j, n=4)
+k_tiles = sch.sample_perfect_tile(k, n=2)
+# Tile the loops according to the random variables
+i_0, i_1, i_2, i_3 = sch.split(loop=i, factors=i_tiles)
+j_0, j_1, j_2, j_3 = sch.split(loop=j, factors=j_tiles)
+k_0, k_1           = sch.split(loop=k, factors=k_tiles)
+# Organize the loops into "SSRSRS" 6-level tiles
+sch.reorder(
+    i_0, j_0, # S: the 1st spatial tile
+    i_1, j_1, # S: the 2nd spatial tile
+    k_0,      # R: the 1st reduction tile
+    i_2, j_2, # S: the 3rd spatial tile
+    k_1,      # R: the 2nd reduction tile
+    i_3, j_3, # S: the 4th spatial tile
+)
+```
+
+### 3.4. AutoScheduler-style Design Space Generation
+
+AutoScheduler (Ansor) generates schedule templates by applying their SearchRules to each stage.
+SearchRule analyzes TE and eagerly trigger schedule primitives accordingly in its internally
+maintained mini IR.
+
+As introduced in Section 3.2, composite schedule rules are equivalent to AutoScheduler's SearchRule
+in TensorIR scheduling. To further generate a design space for scheduling, sampling instructions are

Review comment:
       Revised as the following:
   
   ```markdown
   AutoScheduler (Ansor) generates schedule templates by applying a set of **SearchRule** to each stage.
   SearchRule analyzes TE and eagerly trigger schedule primitives accordingly in its internally
   maintained mini IR.
   
   Composite schedule rules work in a similar way scheduling TensorIR, as introduced in Section 3.2.
   It analyzes the TensorIR and apply schedule primitives directly to TensorIR accordingly.
   When applying such rules to each TensorIR block in certain order (e.g. Post-DFS Order),
   it generates a sequence of schedule primitives.
   If the sampling instructions are present in this sequence, 
   the support of the probability space form a design space of possible schedulings.
   This process is similar to the *sketch generation* phase in AutoScheduler.
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tvm-rfcs] junrushao1994 commented on a change in pull request #5: [RFC] Meta Schedule (AutoTIR)

Posted by GitBox <gi...@apache.org>.
junrushao1994 commented on a change in pull request #5:
URL: https://github.com/apache/tvm-rfcs/pull/5#discussion_r666545934



##########
File path: rfcs/0001-meta-schedule-autotensorir.md
##########
@@ -0,0 +1,444 @@
+<!--- Licensed to the Apache Software Foundation (ASF) under one -->
+<!--- or more contributor license agreements.  See the NOTICE file -->
+<!--- distributed with this work for additional information -->
+<!--- regarding copyright ownership.  The ASF licenses this file -->
+<!--- to you under the Apache License, Version 2.0 (the -->
+<!--- "License"); you may not use this file except in compliance -->
+<!--- with the License.  You may obtain a copy of the License at -->
+
+<!---   http://www.apache.org/licenses/LICENSE-2.0 -->
+
+<!--- Unless required by applicable law or agreed to in writing, -->
+<!--- software distributed under the License is distributed on an -->
+<!--- "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY -->
+<!--- KIND, either express or implied.  See the License for the -->
+<!--- specific language governing permissions and limitations -->
+<!--- under the License. -->
+
+* Feature Name: Meta Schedule (Formerly AutoTIR)
+* Start Date: 2021-05-28
+* RFC PR: TBD (apache/tvm-rfcs#0000)
+* GitHub Issue: TBD (apache/tvm-rfcs#0000)
+
+## 1. Summary
+
+This proposal introduces Meta Schedule: a probabilistic scheduling DSL on TIR that unifies the

Review comment:
       Thanks @areusch for pointing this out! The very first version of this draft refers meta schedule as a probabilistic programming language, but later Tristan and I decided to remove this term (because it's not super important from a developer's perspective, and it can cause potential confusion). It turns out that we forgot to remove the word "probabilistic" in the first paragraph 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tvm-rfcs] junrushao1994 commented on a change in pull request #5: [RFC] Meta Schedule (AutoTIR)

Posted by GitBox <gi...@apache.org>.
junrushao1994 commented on a change in pull request #5:
URL: https://github.com/apache/tvm-rfcs/pull/5#discussion_r668275733



##########
File path: rfcs/0001-meta-schedule-autotensorir.md
##########
@@ -0,0 +1,444 @@
+<!--- Licensed to the Apache Software Foundation (ASF) under one -->
+<!--- or more contributor license agreements.  See the NOTICE file -->
+<!--- distributed with this work for additional information -->
+<!--- regarding copyright ownership.  The ASF licenses this file -->
+<!--- to you under the Apache License, Version 2.0 (the -->
+<!--- "License"); you may not use this file except in compliance -->
+<!--- with the License.  You may obtain a copy of the License at -->
+
+<!---   http://www.apache.org/licenses/LICENSE-2.0 -->
+
+<!--- Unless required by applicable law or agreed to in writing, -->
+<!--- software distributed under the License is distributed on an -->
+<!--- "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY -->
+<!--- KIND, either express or implied.  See the License for the -->
+<!--- specific language governing permissions and limitations -->
+<!--- under the License. -->
+
+* Feature Name: Meta Schedule (Formerly AutoTIR)
+* Start Date: 2021-05-28
+* RFC PR: TBD (apache/tvm-rfcs#0000)
+* GitHub Issue: TBD (apache/tvm-rfcs#0000)
+
+## 1. Summary
+
+This proposal introduces Meta Schedule: a probabilistic scheduling DSL on TIR that unifies the
+approaches of AutoTVM and Auto Scheduler (Ansor). Meta schedule provides a pragmatic way to define
+the space of automatic tuning, extensibility in terms of all possible TIR schedule primitives like
+tensorization and loop partitioning, and customizability on every layer of the automation system.
+
+Meta Schedule is our 3rd generation automatic scheduling system.
+
+## 2. Motivation
+
+**Scheduling and design space.** In TVM TensorIR, optimization of a TensorIR program is done via a
+sequence of transformations. For example, we reorder loops for better locality and we tensorize for
+specific hardware intrinsics. The process of invoking such a set of pre-defined transformations is
+called "**scheduling**", and each transformation is called a "**schedule primitive**". These
+primitives form a domain-specific language (DSL) describing the transformation of TensorIR programs.
+**Design space** is the set of all possible schedulings with respect to a TensorIR program.
+
+**Problems with the current scheduling system.** Currently we have 3 sets of scheduling APIs:
+* **Manual schedule**: Developers optimize their programs by manually invoking schedule primitives,
+  i.e. explore points in the design space with humans in the loop. This can be a tedious and
+  error-prone approach, hence the creation of AutoTVM and AutoScheduler (Ansor).
+* **AutoTVM**: The automation system requires users to define "schedule templates" as the design
+  space for each operator. Therefore, it is inextensible to hundreds of operators and variety
+  hardware platforms.
+* **AutoScheduler (Ansor)**: It automatically generates schedule templates as the design space,
+  according to a set of predefined "search rules". However, it is non-trivial to extend
+  AutoScheduler to new schedule primitives (tensorize, loop partition, software pipelining, etc).
+* The three systems above have isolated sets of APIs with several layers of their own abstraction,
+  which are not only hard to learn, but also engineering-intensive to customize.
+
+**Benefits of Meta Schedule.**  Meta schedule provides:
+* Succinct syntax, consistent APIs to TensorIR schedule with no other layer of abstraction.
+* Unified APIs for implementing manual schedules, AutoTVM-style schedules, and AutoScheduler-style
+  schedules.
+* Extensibility of all the schedule primitives, including tensorization and loop partitioning.
+  Almost no extra effort is needed to use a new primitive in auto-tuning.
+* The automation infrastructure is extensible on every of its components. Every component of the
+  system can be customized easily in pure python or C++ or both. For example, one can develop a new
+  design space generator in python, a new ProgramRunner in python, etc.
+
+
+## 3. Guide-level explanation
+
+In this section, we describe the syntax of meta schedule DSL, and how it could be used to describe
+and auto-generate the design space.
+
+### 3.1. Manual Schedule
+
+Meta schedule APIs are almost the same as TE or TensorIR scheduling. Here is an example of a manual
+schedule for matrix multiplication:
+
+```python
+# Designate a set of tile sizes
+i_tiles = [16, 8, 8, 8]
+j_tiles = [16, 8, 8, 8]
+k_tiles = [256, 8]
+
+# Tile the loops according to the tile sizes
+i_0, i_1, i_2, i_3 = sch.split(loop=i, factors=i_tiles)
+j_0, j_1, j_2, j_3 = sch.split(loop=j, factors=j_tiles)
+k_0, k_1           = sch.split(loop=k, factors=k_tiles)
+
+# Organize the loops into "SSRSRS" 6-level tiles
+sch.reorder(
+    i_0, j_0, # S: the 1st spatial tile
+    i_1, j_1, # S: the 2nd spatial tile
+    k_0,      # R: the 1st reduction tile
+    i_2, j_2, # S: the 3rd spatial tile
+    k_1,      # R: the 2nd reduction tile
+    i_3, j_3, # S: the 4th spatial tile
+)
+```
+
+In this example, the developers may tweak the tile sizes and measure the performance of the
+generated kernels to explore the opportunities of potential optimization.
+
+Generally speaking, while writing a schedule, there are often some parameters that are hard to
+determine ahead of time, for example, tile sizes, unroll steps, or which tensor intrinsics to use.
+Developers may manually enumerate possible combinations of these unknown factors, and then pick the
+best schedule according to measurement results on their device.
+
+### 3.2. Composite Schedule Rules
+
+As introduced in the previous section, in TensorIR, each schedule primitive handles only a very
+basic transformation of the IR. For example, `split` only splits a loop into two new loops. In the
+real world, the over-fine granularity of those primitives usually leads to repetitive and verbose
+scheduling code, as
+[mentioned](https://discuss.tvm.apache.org/t/rfc-tensorir-a-schedulable-ir-for-tvm/7872/43?u=junrushao1994)
+by developers in our community.
+
+To make it more convenient and modular, we allow users to register "composite schedules" that apply

Review comment:
       @areusch what do you think of this example? I can add it to the RFC




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tvm-rfcs] junrushao1994 commented on a change in pull request #5: [RFC] Meta Schedule (AutoTIR)

Posted by GitBox <gi...@apache.org>.
junrushao1994 commented on a change in pull request #5:
URL: https://github.com/apache/tvm-rfcs/pull/5#discussion_r668494320



##########
File path: rfcs/0001-meta-schedule-autotensorir.md
##########
@@ -0,0 +1,444 @@
+<!--- Licensed to the Apache Software Foundation (ASF) under one -->
+<!--- or more contributor license agreements.  See the NOTICE file -->
+<!--- distributed with this work for additional information -->
+<!--- regarding copyright ownership.  The ASF licenses this file -->
+<!--- to you under the Apache License, Version 2.0 (the -->
+<!--- "License"); you may not use this file except in compliance -->
+<!--- with the License.  You may obtain a copy of the License at -->
+
+<!---   http://www.apache.org/licenses/LICENSE-2.0 -->
+
+<!--- Unless required by applicable law or agreed to in writing, -->
+<!--- software distributed under the License is distributed on an -->
+<!--- "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY -->
+<!--- KIND, either express or implied.  See the License for the -->
+<!--- specific language governing permissions and limitations -->
+<!--- under the License. -->
+
+* Feature Name: Meta Schedule (Formerly AutoTIR)
+* Start Date: 2021-05-28
+* RFC PR: TBD (apache/tvm-rfcs#0000)
+* GitHub Issue: TBD (apache/tvm-rfcs#0000)
+
+## 1. Summary
+
+This proposal introduces Meta Schedule: a probabilistic scheduling DSL on TIR that unifies the
+approaches of AutoTVM and Auto Scheduler (Ansor). Meta schedule provides a pragmatic way to define
+the space of automatic tuning, extensibility in terms of all possible TIR schedule primitives like
+tensorization and loop partitioning, and customizability on every layer of the automation system.
+
+Meta Schedule is our 3rd generation automatic scheduling system.
+
+## 2. Motivation
+
+**Scheduling and design space.** In TVM TensorIR, optimization of a TensorIR program is done via a
+sequence of transformations. For example, we reorder loops for better locality and we tensorize for
+specific hardware intrinsics. The process of invoking such a set of pre-defined transformations is
+called "**scheduling**", and each transformation is called a "**schedule primitive**". These
+primitives form a domain-specific language (DSL) describing the transformation of TensorIR programs.
+**Design space** is the set of all possible schedulings with respect to a TensorIR program.
+
+**Problems with the current scheduling system.** Currently we have 3 sets of scheduling APIs:
+* **Manual schedule**: Developers optimize their programs by manually invoking schedule primitives,
+  i.e. explore points in the design space with humans in the loop. This can be a tedious and
+  error-prone approach, hence the creation of AutoTVM and AutoScheduler (Ansor).
+* **AutoTVM**: The automation system requires users to define "schedule templates" as the design
+  space for each operator. Therefore, it is inextensible to hundreds of operators and variety
+  hardware platforms.
+* **AutoScheduler (Ansor)**: It automatically generates schedule templates as the design space,
+  according to a set of predefined "search rules". However, it is non-trivial to extend
+  AutoScheduler to new schedule primitives (tensorize, loop partition, software pipelining, etc).
+* The three systems above have isolated sets of APIs with several layers of their own abstraction,
+  which are not only hard to learn, but also engineering-intensive to customize.
+
+**Benefits of Meta Schedule.**  Meta schedule provides:
+* Succinct syntax, consistent APIs to TensorIR schedule with no other layer of abstraction.
+* Unified APIs for implementing manual schedules, AutoTVM-style schedules, and AutoScheduler-style
+  schedules.
+* Extensibility of all the schedule primitives, including tensorization and loop partitioning.
+  Almost no extra effort is needed to use a new primitive in auto-tuning.
+* The automation infrastructure is extensible on every of its components. Every component of the
+  system can be customized easily in pure python or C++ or both. For example, one can develop a new
+  design space generator in python, a new ProgramRunner in python, etc.
+
+
+## 3. Guide-level explanation
+
+In this section, we describe the syntax of meta schedule DSL, and how it could be used to describe
+and auto-generate the design space.
+
+### 3.1. Manual Schedule
+
+Meta schedule APIs are almost the same as TE or TensorIR scheduling. Here is an example of a manual
+schedule for matrix multiplication:
+
+```python
+# Designate a set of tile sizes
+i_tiles = [16, 8, 8, 8]
+j_tiles = [16, 8, 8, 8]
+k_tiles = [256, 8]
+
+# Tile the loops according to the tile sizes
+i_0, i_1, i_2, i_3 = sch.split(loop=i, factors=i_tiles)
+j_0, j_1, j_2, j_3 = sch.split(loop=j, factors=j_tiles)
+k_0, k_1           = sch.split(loop=k, factors=k_tiles)
+
+# Organize the loops into "SSRSRS" 6-level tiles
+sch.reorder(
+    i_0, j_0, # S: the 1st spatial tile
+    i_1, j_1, # S: the 2nd spatial tile
+    k_0,      # R: the 1st reduction tile
+    i_2, j_2, # S: the 3rd spatial tile
+    k_1,      # R: the 2nd reduction tile
+    i_3, j_3, # S: the 4th spatial tile
+)
+```
+
+In this example, the developers may tweak the tile sizes and measure the performance of the
+generated kernels to explore the opportunities of potential optimization.
+
+Generally speaking, while writing a schedule, there are often some parameters that are hard to
+determine ahead of time, for example, tile sizes, unroll steps, or which tensor intrinsics to use.
+Developers may manually enumerate possible combinations of these unknown factors, and then pick the
+best schedule according to measurement results on their device.
+
+### 3.2. Composite Schedule Rules
+
+As introduced in the previous section, in TensorIR, each schedule primitive handles only a very
+basic transformation of the IR. For example, `split` only splits a loop into two new loops. In the
+real world, the over-fine granularity of those primitives usually leads to repetitive and verbose
+scheduling code, as
+[mentioned](https://discuss.tvm.apache.org/t/rfc-tensorir-a-schedulable-ir-for-tvm/7872/43?u=junrushao1994)
+by developers in our community.
+
+To make it more convenient and modular, we allow users to register "composite schedules" that apply
+a sequence of schedule primitives according to certain analysis of the IR. For instance, a composite
+schedule may inspect a TensorIR block and decide whether we should call `compute_inline` on it.
+
+### 3.3. AutoTVM-style Design Space Description
+
+Meta schedule extends the schedule DSL with sampling instructions. When included in a schedule,
+these instructions parametrize the schedule from a single deterministic point to a space supported
+by random variables (tile size, etc.), making it possible for developers to describe the design
+space with meta schedule APIs.
+
+We can extend the matmul example above to cover all possible tilings using these sampling
+instructions:
+
+```python
+# Sample tile sizes
+i_tiles = sch.sample_perfect_tile(i, n=4)
+j_tiles = sch.sample_perfect_tile(j, n=4)
+k_tiles = sch.sample_perfect_tile(k, n=2)
+# Tile the loops according to the random variables
+i_0, i_1, i_2, i_3 = sch.split(loop=i, factors=i_tiles)
+j_0, j_1, j_2, j_3 = sch.split(loop=j, factors=j_tiles)
+k_0, k_1           = sch.split(loop=k, factors=k_tiles)
+# Organize the loops into "SSRSRS" 6-level tiles
+sch.reorder(
+    i_0, j_0, # S: the 1st spatial tile
+    i_1, j_1, # S: the 2nd spatial tile
+    k_0,      # R: the 1st reduction tile
+    i_2, j_2, # S: the 3rd spatial tile
+    k_1,      # R: the 2nd reduction tile
+    i_3, j_3, # S: the 4th spatial tile
+)
+```
+
+### 3.4. AutoScheduler-style Design Space Generation
+
+AutoScheduler (Ansor) generates schedule templates by applying their SearchRules to each stage.
+SearchRule analyzes TE and eagerly trigger schedule primitives accordingly in its internally
+maintained mini IR.
+
+As introduced in Section 3.2, composite schedule rules are equivalent to AutoScheduler's SearchRule
+in TensorIR scheduling. To further generate a design space for scheduling, sampling instructions are
+used in composite schedule rules. Similarly, the sketch generation phase in AutoScheduler is
+equivalent to applying composite schedule rules to each block in TensorIR.
+
+Several built-in composite schedule rules are shipped with our system to align with Ansor's design
+space:
+
+* Multi-level tiling
+* Inline pure spatial blocks
+* Parallelize & vectorize & unroll
+* Auto tensorize
+
+Developers may implement their own rules in either Python or C++. They may specify which rules to
+use with the following syntax:
+
+```python
+from tvm import meta_schedule as ms
+
+design_space_generator = ms.PostOrderApply(rules=[
+    ms.MultiLevelTiling(...),
+    CustomRule(...),
+    ms.OtherBuiltinRules(...),
+])
+
+```
+
+### 3.5. Unifying manual schedule / AutoTVM / Ansor
+
+In this section, we show that the design space induced by TE manual schedule, AutoTVM and Ansor are
+all subsets of meta schedule, and meta schedule further allows mixing those three styles to search
+jointly.
+
+**Manual schedule**. The TE schedule is a special case of a meta schedule program, where there is no
+randomness introduced by sampling instructions. It is a single point in terms of design space.
+
+**AutoTVM (Template-based tuning)**. It is more natural representation of AutoTVM’s schedule
+templates (knobs) by writing one or more schedule functions in meta schedule with sampling
+instructions. The execution traces generated by the schedule functions are the design space to be
+explored.
+
+**AutoScheduler (Ansor, Template-free tuning)**. As mentioned in the previous section, application
+of composite schedule rules generates the design space, which is equivalent to Ansor’s sketch
+generation.
+
+**Mixing styles in design space definition**. By taking union of the spaces induced by the three
+special cases, our system allows developers to combine generic rules that Ansor provides and
+operator-specific scheduling.
+
+## 4. Reference-level explanation
+
+In this section, we introduce the underlying techniques for the automation system to extract and
+explore the design space. The figure below briefly illustrates the workflow of the system:
+
+![meta-schedule-workflow](../resources/meta-schedule-workflow.png)
+
+### 4.1. Execution trace as the design space
+
+**Trace**. To represent the design space defined by the meta schedule DSL, the underlying system
+records all the instructions users applied to the schedule class, including sampling and schedule
+primitives. We call this list of instructions a trace.
+
+For instance, executing the example above results in the following trace:
+
+```
+Instruction 0. Sample-Perfect-Tile
+Instruction 1. Sample-Perfect-Tile
+Instruction 2. Sample-Perfect-Tile
+Instruction 3. Split
+Instruction 4. Split
+Instruction 5. Split
+Instruction 6. Reorder
+```
+
+**Trace forms design space.** A trace may contain zero or more sampling instructions, which
+introduces the uncertainty in scheduling - one instance of sampling results in one point in the
+design space. Therefore, the trace itself forms a design space to explore, e.g. which set of tile
+sizes works best on a specific hardware.
+
+**Union of design space**. Our system works on a set of traces, representing the union of the design
+spaces represented by every single trace.
+
+**Fork a trace**. When two different decisions in the scheduling process are equally important to
+generate high-performance schedules, we allow forking the trace into two, and the design space is
+the union of the forked traces.
+
+The trace is not strictly user-facing, but can be accessed and printed with the following syntax:
+
+```python
+# requires to trace the execution
+sch = tir.Schedule(..., traced=True)
+# do a lot of scheduling
+...
+# print the trace
+print(sch.trace)
+```
+
+And below is an example of the printed trace, which honestly reflects the schedule as a snippet of
+python scheduling function:
+
+```python
+b0 = sch.get_block(name="matmul", func_name="main")
+l1, l2, l3 = sch.get_loops(block=b0)
+v4, v5, v6, v7 = sch.sample_perfect_tile(loop=l1, n=4, max_innermost_factor=16, decision=[32, 1, 16, 2])
+v8, v9, v10, v11 = sch.sample_perfect_tile(loop=l2, n=4, max_innermost_factor=16, decision=[64, 4, 2, 2])
+v12, v13 = sch.sample_perfect_tile(loop=l3, n=2, max_innermost_factor=16, decision=[64, 16])
+l14, l15, l16, l17 = sch.split(loop=l1, factors=[v4, v5, v6, v7])
+l18, l19, l20, l21 = sch.split(loop=l2, factors=[v8, v9, v10, v11])
+l22, l23 = sch.split(loop=l3, factors=[v12, v13])
+sch.reorder(l14, l18, l15, l19, l22, l16, l20, l23, l17, l21)
+```
+
+### 4.2. Exploring the Design Space
+
+Meta Schedule provides several built-in exploration strategies to exhaustively or efficiently search
+for efficient schedules.
+
+**Random search by replaying schedule functions.** With a user-provided schedule function
+as a black-box design space generator, our system could repetitively invoke such an opaque function
+without doing any extra analysis. The function could be written in C++ or Python, or any language
+that implements packed function FFI. If sampling instructions are present in the function, then each
+invocation results in a different IRModule after being scheduled because the random decisions are
+possibly changed across different runs. Effectively, it is equivalent to
+random exploration without trace, allowing the flexibility for users to define arbitrary functions
+that trace may not well support (e.g. control flow divergence based on the value of intermediate
+random variables), but it forbids future opportunity of any trace-based analysis.
+
+**Random search by replaying traces.** Traces are obtained from a design space generator, and
+replayed with a builtin interpreter in our system. If sampling instructions are present on the
+traces, then their random decisions are mutated during each replay, i.e. jumps to a new point in the
+design space. Therefore, repetitive replay of those traces are equivalent to exploration of the
+design space. Our system could potentially benefit from trace-based analysis, including rejecting
+obviously invalid schedules (e.g. using too much CUDA resources), doing dead-code elimination to
+simplify a trace, extracting trace-based features used in the cost model, etc.
+
+**Cost-model-guided evolutionary search**. A more efficient exploration strategy. We define two sets
+of rules:
+
+* Mutator: defines how to jump to a point’s "neighbor" in the design space
+* Postprocessor: after the trace is executed, there are some extra rules we want to execute, for
+  example:
+  * Check CUDA resource limits: There is a hard requirement in CUDA that the maximum number of
+    threads should not exceed 1024, but it is a random variable that cannot be determined before
+    actually executing the trace. In this case, we write a postprocessor that errors out when the
+    condition is not satisfied.
+  * Fuse outer loops until the extent of the fused loops is large enough: The number of outer loops
+    to be fused together depends on their extents, which are random variables. In this case, we
+    annotate the maximum extent allowed on the block, and do actual fusion in a postprocessor.
+
+Our evolutionary search algorithm uses mutators to find possible schedules in the design space, then
+applies postprocessors, and asks the cost model to predict its performance. After several
+iterations, the new schedules with the highest scores are finally compiled and measured on device.
+Epsilon-greedy is used in this process to balance exploitation and exploration.
+
+### 4.3. Python first for flexibility & customizability
+
+We implement the system in a way that all levels are decoupled and open to customization, aiming at
+providing a playground for developers to try out new ideas and potentially deliver performance
+quickly.
+
+While all the important APIs are implemented in C++ for efficiency, every part of the system can be
+easily switched to customized python implementation. For example,
+
+**Customize design space in python**. Can be a python function that does the schedule
+
+```python
+def schedule_matmul(sch) -> sch:
+    i, j, k = sch.get_loops(sch.get_block("matmul"))
+    i_tiles = sch.sample_perfect_tile(i, n=4)
+    j_tiles = sch.sample_perfect_tile(j, n=4)
+    k_tiles = sch.sample_perfect_tile(k, n=2)
+    # Tile the loops according to the random variables
+    i_0, i_1, i_2, i_3 = sch.split(loop=i, factors=i_tiles)
+    j_0, j_1, j_2, j_3 = sch.split(loop=j, factors=j_tiles)
+    k_0, k_1 = sch.split(loop=k, factors=k_tiles)
+    # Organize the loops into "SSRSRS" 6-level tiles
+    sch.reorder(
+        i_0, j_0, # S
+        i_1, j_1, # S
+        k_0,      # R
+        i_2, j_2, # S
+        k_1,      # R
+        i_3, j_3, # S
+    )
+    return sch
+```
+
+**Customize composite schedule in python**. We provide two ways to define a composite schedule in
+python:
+
+Method 1. Derive from `PyCompositeSchedule`, and implement two methods `initialize` and `apply`:
+
+```python
+class MultiLevelTiling(PyCompositeSchedule):
+    def initialize(...):
+        # initialize the class, usually this method is empty
+        ...
+
+    def apply(sch: Schedule, block: BlockRV) -> Union[Schedule, List[Schedule]]:
+        # write any python code, including:
+        # - analyze `block`
+        # - invoke schedule primitives in `sch`
+        # - do debug printing
+        ...
+```
+
+Method 2. A decorator as the syntactic sugar if the `initialize` method is empty, which converts the
+function to the `apply` method.
+
+```python
+@tir.as_composite_schedule(name="multi-level-tiling")
+def multi_level_tiling(sch: Schedule, block: BlockRV) -> Union[Schedule, List[Schedule]]:
+    # write any python code, including:
+    # - analyze `block`
+    # - invoke schedule primitives in `sch`
+    # - do debug printing
+    ...
+```
+
+**Customize exploration strategies in python**. Developers can implement any search algorithm in
+python as well by deriving from `PySearchPolicy`, and the syntax is identical to customizing with
+`PyCompositeSchedule`.
+
+**Other customizable components**. This list includes:
+
+* Cost model
+* Database
+* Measure callbacks
+* Feature extractor
+* Program builder & runner
+* Analysis methods
+* ...
+
+In a short summary, almost every component of the system is decoupled with each other and extensions
+could be easily plugged in.
+
+## 5. Drawbacks
+
+We are not aware of any drawbacks of the proposed system.
+
+## 6. Rationale and alternatives
+
+The system is designed with the principle of minimalism: different from alternative solutions, we do

Review comment:
       The paragraph is not well written. Just revised it:
   
   ```markdown
   The system is designed with the principle of minimalism: Assuming users already know TensorIR
   scheduling APIs, there is no more extra API set to learn; The previous programs that schedules TensorIR
   still work out of box with the meta schedule DSL. It could potentially lower the bar of adopting
   this system.
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tvm-rfcs] junrushao1994 commented on a change in pull request #5: [RFC] Meta Schedule (AutoTIR)

Posted by GitBox <gi...@apache.org>.
junrushao1994 commented on a change in pull request #5:
URL: https://github.com/apache/tvm-rfcs/pull/5#discussion_r668395730



##########
File path: rfcs/0001-meta-schedule-autotensorir.md
##########
@@ -0,0 +1,444 @@
+<!--- Licensed to the Apache Software Foundation (ASF) under one -->
+<!--- or more contributor license agreements.  See the NOTICE file -->
+<!--- distributed with this work for additional information -->
+<!--- regarding copyright ownership.  The ASF licenses this file -->
+<!--- to you under the Apache License, Version 2.0 (the -->
+<!--- "License"); you may not use this file except in compliance -->
+<!--- with the License.  You may obtain a copy of the License at -->
+
+<!---   http://www.apache.org/licenses/LICENSE-2.0 -->
+
+<!--- Unless required by applicable law or agreed to in writing, -->
+<!--- software distributed under the License is distributed on an -->
+<!--- "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY -->
+<!--- KIND, either express or implied.  See the License for the -->
+<!--- specific language governing permissions and limitations -->
+<!--- under the License. -->
+
+* Feature Name: Meta Schedule (Formerly AutoTIR)
+* Start Date: 2021-05-28
+* RFC PR: TBD (apache/tvm-rfcs#0000)
+* GitHub Issue: TBD (apache/tvm-rfcs#0000)
+
+## 1. Summary
+
+This proposal introduces Meta Schedule: a probabilistic scheduling DSL on TIR that unifies the
+approaches of AutoTVM and Auto Scheduler (Ansor). Meta schedule provides a pragmatic way to define
+the space of automatic tuning, extensibility in terms of all possible TIR schedule primitives like
+tensorization and loop partitioning, and customizability on every layer of the automation system.
+
+Meta Schedule is our 3rd generation automatic scheduling system.
+
+## 2. Motivation
+
+**Scheduling and design space.** In TVM TensorIR, optimization of a TensorIR program is done via a
+sequence of transformations. For example, we reorder loops for better locality and we tensorize for
+specific hardware intrinsics. The process of invoking such a set of pre-defined transformations is
+called "**scheduling**", and each transformation is called a "**schedule primitive**". These
+primitives form a domain-specific language (DSL) describing the transformation of TensorIR programs.
+**Design space** is the set of all possible schedulings with respect to a TensorIR program.
+
+**Problems with the current scheduling system.** Currently we have 3 sets of scheduling APIs:
+* **Manual schedule**: Developers optimize their programs by manually invoking schedule primitives,
+  i.e. explore points in the design space with humans in the loop. This can be a tedious and
+  error-prone approach, hence the creation of AutoTVM and AutoScheduler (Ansor).
+* **AutoTVM**: The automation system requires users to define "schedule templates" as the design
+  space for each operator. Therefore, it is inextensible to hundreds of operators and variety
+  hardware platforms.
+* **AutoScheduler (Ansor)**: It automatically generates schedule templates as the design space,
+  according to a set of predefined "search rules". However, it is non-trivial to extend
+  AutoScheduler to new schedule primitives (tensorize, loop partition, software pipelining, etc).
+* The three systems above have isolated sets of APIs with several layers of their own abstraction,
+  which are not only hard to learn, but also engineering-intensive to customize.
+
+**Benefits of Meta Schedule.**  Meta schedule provides:
+* Succinct syntax, consistent APIs to TensorIR schedule with no other layer of abstraction.
+* Unified APIs for implementing manual schedules, AutoTVM-style schedules, and AutoScheduler-style
+  schedules.
+* Extensibility of all the schedule primitives, including tensorization and loop partitioning.
+  Almost no extra effort is needed to use a new primitive in auto-tuning.
+* The automation infrastructure is extensible on every of its components. Every component of the
+  system can be customized easily in pure python or C++ or both. For example, one can develop a new
+  design space generator in python, a new ProgramRunner in python, etc.
+
+
+## 3. Guide-level explanation
+
+In this section, we describe the syntax of meta schedule DSL, and how it could be used to describe
+and auto-generate the design space.
+
+### 3.1. Manual Schedule
+
+Meta schedule APIs are almost the same as TE or TensorIR scheduling. Here is an example of a manual
+schedule for matrix multiplication:
+
+```python
+# Designate a set of tile sizes
+i_tiles = [16, 8, 8, 8]
+j_tiles = [16, 8, 8, 8]
+k_tiles = [256, 8]
+
+# Tile the loops according to the tile sizes
+i_0, i_1, i_2, i_3 = sch.split(loop=i, factors=i_tiles)
+j_0, j_1, j_2, j_3 = sch.split(loop=j, factors=j_tiles)
+k_0, k_1           = sch.split(loop=k, factors=k_tiles)
+
+# Organize the loops into "SSRSRS" 6-level tiles
+sch.reorder(
+    i_0, j_0, # S: the 1st spatial tile
+    i_1, j_1, # S: the 2nd spatial tile
+    k_0,      # R: the 1st reduction tile
+    i_2, j_2, # S: the 3rd spatial tile
+    k_1,      # R: the 2nd reduction tile
+    i_3, j_3, # S: the 4th spatial tile
+)
+```
+
+In this example, the developers may tweak the tile sizes and measure the performance of the
+generated kernels to explore the opportunities of potential optimization.
+
+Generally speaking, while writing a schedule, there are often some parameters that are hard to
+determine ahead of time, for example, tile sizes, unroll steps, or which tensor intrinsics to use.
+Developers may manually enumerate possible combinations of these unknown factors, and then pick the
+best schedule according to measurement results on their device.
+
+### 3.2. Composite Schedule Rules
+
+As introduced in the previous section, in TensorIR, each schedule primitive handles only a very
+basic transformation of the IR. For example, `split` only splits a loop into two new loops. In the
+real world, the over-fine granularity of those primitives usually leads to repetitive and verbose
+scheduling code, as
+[mentioned](https://discuss.tvm.apache.org/t/rfc-tensorir-a-schedulable-ir-for-tvm/7872/43?u=junrushao1994)
+by developers in our community.
+
+To make it more convenient and modular, we allow users to register "composite schedules" that apply
+a sequence of schedule primitives according to certain analysis of the IR. For instance, a composite
+schedule may inspect a TensorIR block and decide whether we should call `compute_inline` on it.
+
+### 3.3. AutoTVM-style Design Space Description
+
+Meta schedule extends the schedule DSL with sampling instructions. When included in a schedule,
+these instructions parametrize the schedule from a single deterministic point to a space supported
+by random variables (tile size, etc.), making it possible for developers to describe the design
+space with meta schedule APIs.
+
+We can extend the matmul example above to cover all possible tilings using these sampling
+instructions:
+
+```python
+# Sample tile sizes
+i_tiles = sch.sample_perfect_tile(i, n=4)
+j_tiles = sch.sample_perfect_tile(j, n=4)
+k_tiles = sch.sample_perfect_tile(k, n=2)
+# Tile the loops according to the random variables
+i_0, i_1, i_2, i_3 = sch.split(loop=i, factors=i_tiles)
+j_0, j_1, j_2, j_3 = sch.split(loop=j, factors=j_tiles)
+k_0, k_1           = sch.split(loop=k, factors=k_tiles)
+# Organize the loops into "SSRSRS" 6-level tiles
+sch.reorder(
+    i_0, j_0, # S: the 1st spatial tile
+    i_1, j_1, # S: the 2nd spatial tile
+    k_0,      # R: the 1st reduction tile
+    i_2, j_2, # S: the 3rd spatial tile
+    k_1,      # R: the 2nd reduction tile
+    i_3, j_3, # S: the 4th spatial tile
+)
+```
+
+### 3.4. AutoScheduler-style Design Space Generation
+
+AutoScheduler (Ansor) generates schedule templates by applying their SearchRules to each stage.
+SearchRule analyzes TE and eagerly trigger schedule primitives accordingly in its internally
+maintained mini IR.
+
+As introduced in Section 3.2, composite schedule rules are equivalent to AutoScheduler's SearchRule
+in TensorIR scheduling. To further generate a design space for scheduling, sampling instructions are
+used in composite schedule rules. Similarly, the sketch generation phase in AutoScheduler is
+equivalent to applying composite schedule rules to each block in TensorIR.
+
+Several built-in composite schedule rules are shipped with our system to align with Ansor's design
+space:
+
+* Multi-level tiling
+* Inline pure spatial blocks
+* Parallelize & vectorize & unroll
+* Auto tensorize
+
+Developers may implement their own rules in either Python or C++. They may specify which rules to
+use with the following syntax:
+
+```python
+from tvm import meta_schedule as ms
+
+design_space_generator = ms.PostOrderApply(rules=[
+    ms.MultiLevelTiling(...),
+    CustomRule(...),
+    ms.OtherBuiltinRules(...),
+])
+
+```
+
+### 3.5. Unifying manual schedule / AutoTVM / Ansor
+
+In this section, we show that the design space induced by TE manual schedule, AutoTVM and Ansor are
+all subsets of meta schedule, and meta schedule further allows mixing those three styles to search
+jointly.
+
+**Manual schedule**. The TE schedule is a special case of a meta schedule program, where there is no
+randomness introduced by sampling instructions. It is a single point in terms of design space.
+
+**AutoTVM (Template-based tuning)**. It is more natural representation of AutoTVM’s schedule
+templates (knobs) by writing one or more schedule functions in meta schedule with sampling
+instructions. The execution traces generated by the schedule functions are the design space to be
+explored.
+
+**AutoScheduler (Ansor, Template-free tuning)**. As mentioned in the previous section, application
+of composite schedule rules generates the design space, which is equivalent to Ansor’s sketch
+generation.
+
+**Mixing styles in design space definition**. By taking union of the spaces induced by the three
+special cases, our system allows developers to combine generic rules that Ansor provides and
+operator-specific scheduling.
+
+## 4. Reference-level explanation
+
+In this section, we introduce the underlying techniques for the automation system to extract and
+explore the design space. The figure below briefly illustrates the workflow of the system:
+
+![meta-schedule-workflow](../resources/meta-schedule-workflow.png)
+
+### 4.1. Execution trace as the design space
+
+**Trace**. To represent the design space defined by the meta schedule DSL, the underlying system
+records all the instructions users applied to the schedule class, including sampling and schedule
+primitives. We call this list of instructions a trace.
+
+For instance, executing the example above results in the following trace:
+
+```
+Instruction 0. Sample-Perfect-Tile
+Instruction 1. Sample-Perfect-Tile
+Instruction 2. Sample-Perfect-Tile
+Instruction 3. Split
+Instruction 4. Split
+Instruction 5. Split
+Instruction 6. Reorder
+```
+
+**Trace forms design space.** A trace may contain zero or more sampling instructions, which
+introduces the uncertainty in scheduling - one instance of sampling results in one point in the
+design space. Therefore, the trace itself forms a design space to explore, e.g. which set of tile
+sizes works best on a specific hardware.
+
+**Union of design space**. Our system works on a set of traces, representing the union of the design
+spaces represented by every single trace.
+
+**Fork a trace**. When two different decisions in the scheduling process are equally important to
+generate high-performance schedules, we allow forking the trace into two, and the design space is
+the union of the forked traces.
+
+The trace is not strictly user-facing, but can be accessed and printed with the following syntax:
+
+```python
+# requires to trace the execution
+sch = tir.Schedule(..., traced=True)
+# do a lot of scheduling
+...
+# print the trace
+print(sch.trace)
+```
+
+And below is an example of the printed trace, which honestly reflects the schedule as a snippet of
+python scheduling function:
+
+```python
+b0 = sch.get_block(name="matmul", func_name="main")
+l1, l2, l3 = sch.get_loops(block=b0)
+v4, v5, v6, v7 = sch.sample_perfect_tile(loop=l1, n=4, max_innermost_factor=16, decision=[32, 1, 16, 2])
+v8, v9, v10, v11 = sch.sample_perfect_tile(loop=l2, n=4, max_innermost_factor=16, decision=[64, 4, 2, 2])
+v12, v13 = sch.sample_perfect_tile(loop=l3, n=2, max_innermost_factor=16, decision=[64, 16])
+l14, l15, l16, l17 = sch.split(loop=l1, factors=[v4, v5, v6, v7])
+l18, l19, l20, l21 = sch.split(loop=l2, factors=[v8, v9, v10, v11])
+l22, l23 = sch.split(loop=l3, factors=[v12, v13])
+sch.reorder(l14, l18, l15, l19, l22, l16, l20, l23, l17, l21)
+```
+
+### 4.2. Exploring the Design Space
+
+Meta Schedule provides several built-in exploration strategies to exhaustively or efficiently search
+for efficient schedules.
+
+**Random search by replaying schedule functions.** With a user-provided schedule function
+as a black-box design space generator, our system could repetitively invoke such an opaque function
+without doing any extra analysis. The function could be written in C++ or Python, or any language
+that implements packed function FFI. If sampling instructions are present in the function, then each
+invocation results in a different IRModule after being scheduled because the random decisions are
+possibly changed across different runs. Effectively, it is equivalent to
+random exploration without trace, allowing the flexibility for users to define arbitrary functions
+that trace may not well support (e.g. control flow divergence based on the value of intermediate
+random variables), but it forbids future opportunity of any trace-based analysis.
+
+**Random search by replaying traces.** Traces are obtained from a design space generator, and

Review comment:
       Oh it means any style of design space specification as mentioned in Section 3.5. Will add the clarification here




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tvm-rfcs] junrushao1994 commented on a change in pull request #5: [RFC] Meta Schedule (AutoTIR)

Posted by GitBox <gi...@apache.org>.
junrushao1994 commented on a change in pull request #5:
URL: https://github.com/apache/tvm-rfcs/pull/5#discussion_r676830400



##########
File path: rfcs/0001-meta-schedule-autotensorir.md
##########
@@ -266,55 +322,83 @@ sch.reorder(l14, l18, l15, l19, l22, l16, l20, l23, l17, l21)
 ### 4.2. Exploring the Design Space
 
 Meta Schedule provides several built-in exploration strategies to exhaustively or efficiently search
-for efficient schedules.
-
-**Random search by replaying schedule functions.** With a user-provided schedule function
-as a black-box design space generator, our system could repetitively invoke such an opaque function
-without doing any extra analysis. The function could be written in C++ or Python, or any language
-that implements packed function FFI. If sampling instructions are present in the function, then each
-invocation results in a different IRModule after being scheduled because the random decisions are
-possibly changed across different runs. Effectively, it is equivalent to
-random exploration without trace, allowing the flexibility for users to define arbitrary functions
+for efficient schedules. Those strategies are mostly supported by re-execute either a function or a
+trace with a builtin interpreter in meta schedule, and this process is called **replay**.
+
+#### Random search by replaying schedule functions
+
+With a user-provided schedule function
+as a black-box design space generator, meta schedule could repetitively invokes such an opaque TVM
+packed function without doing any extra analysis.
+If sampling instructions are present in the trace, then scheduling is non-deterministic
+(random decisions may not be repeated across runs)
+Effectively, it is equivalent to random exploration without trace,
+allowing the flexibility for users to define arbitrary functions
 that trace may not well support (e.g. control flow divergence based on the value of intermediate
 random variables), but it forbids future opportunity of any trace-based analysis.
 
-**Random search by replaying traces.** Traces are obtained from a design space generator, and
-replayed with a builtin interpreter in our system. If sampling instructions are present on the
-traces, then their random decisions are mutated during each replay, i.e. jumps to a new point in the
+#### Random search by replaying traces
+
+A builtin interpreter directly replays the traces obtained
+from manual schedule, template-based or template-free design space generation.
+If sampling instructions are present on the traces,
+then their random decisions are mutated during each replay, i.e. jumps to a new point in the
 design space. Therefore, repetitive replay of those traces are equivalent to exploration of the
-design space. Our system could potentially benefit from trace-based analysis, including rejecting
-obviously invalid schedules (e.g. using too much CUDA resources), doing dead-code elimination to
-simplify a trace, extracting trace-based features used in the cost model, etc.
+design space. meta schedule could potentially benefit from trace-based analysis, making the search more
+efficient, including rejecting obviously invalid schedules (e.g. using too much CUDA resources),
+doing dead-code elimination to simplify a trace, extracting trace-based features used in the cost
+model, etc.
 
-**Cost-model-guided evolutionary search**. A more efficient exploration strategy. We define two sets
-of rules:
+#### Cost-model-guided evolutionary search
+
+A more efficient exploration strategy, introduced in the Ansor.

Review comment:
       Just added section 5.1 and a citation here




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tvm-rfcs] junrushao1994 commented on a change in pull request #5: [RFC] Meta Schedule (AutoTIR)

Posted by GitBox <gi...@apache.org>.
junrushao1994 commented on a change in pull request #5:
URL: https://github.com/apache/tvm-rfcs/pull/5#discussion_r668326765



##########
File path: rfcs/0001-meta-schedule-autotensorir.md
##########
@@ -0,0 +1,444 @@
+<!--- Licensed to the Apache Software Foundation (ASF) under one -->
+<!--- or more contributor license agreements.  See the NOTICE file -->
+<!--- distributed with this work for additional information -->
+<!--- regarding copyright ownership.  The ASF licenses this file -->
+<!--- to you under the Apache License, Version 2.0 (the -->
+<!--- "License"); you may not use this file except in compliance -->
+<!--- with the License.  You may obtain a copy of the License at -->
+
+<!---   http://www.apache.org/licenses/LICENSE-2.0 -->
+
+<!--- Unless required by applicable law or agreed to in writing, -->
+<!--- software distributed under the License is distributed on an -->
+<!--- "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY -->
+<!--- KIND, either express or implied.  See the License for the -->
+<!--- specific language governing permissions and limitations -->
+<!--- under the License. -->
+
+* Feature Name: Meta Schedule (Formerly AutoTIR)
+* Start Date: 2021-05-28
+* RFC PR: TBD (apache/tvm-rfcs#0000)
+* GitHub Issue: TBD (apache/tvm-rfcs#0000)
+
+## 1. Summary
+
+This proposal introduces Meta Schedule: a probabilistic scheduling DSL on TIR that unifies the
+approaches of AutoTVM and Auto Scheduler (Ansor). Meta schedule provides a pragmatic way to define
+the space of automatic tuning, extensibility in terms of all possible TIR schedule primitives like
+tensorization and loop partitioning, and customizability on every layer of the automation system.
+
+Meta Schedule is our 3rd generation automatic scheduling system.
+
+## 2. Motivation
+
+**Scheduling and design space.** In TVM TensorIR, optimization of a TensorIR program is done via a
+sequence of transformations. For example, we reorder loops for better locality and we tensorize for
+specific hardware intrinsics. The process of invoking such a set of pre-defined transformations is
+called "**scheduling**", and each transformation is called a "**schedule primitive**". These
+primitives form a domain-specific language (DSL) describing the transformation of TensorIR programs.
+**Design space** is the set of all possible schedulings with respect to a TensorIR program.
+
+**Problems with the current scheduling system.** Currently we have 3 sets of scheduling APIs:
+* **Manual schedule**: Developers optimize their programs by manually invoking schedule primitives,
+  i.e. explore points in the design space with humans in the loop. This can be a tedious and
+  error-prone approach, hence the creation of AutoTVM and AutoScheduler (Ansor).
+* **AutoTVM**: The automation system requires users to define "schedule templates" as the design
+  space for each operator. Therefore, it is inextensible to hundreds of operators and variety
+  hardware platforms.
+* **AutoScheduler (Ansor)**: It automatically generates schedule templates as the design space,
+  according to a set of predefined "search rules". However, it is non-trivial to extend
+  AutoScheduler to new schedule primitives (tensorize, loop partition, software pipelining, etc).
+* The three systems above have isolated sets of APIs with several layers of their own abstraction,
+  which are not only hard to learn, but also engineering-intensive to customize.
+
+**Benefits of Meta Schedule.**  Meta schedule provides:
+* Succinct syntax, consistent APIs to TensorIR schedule with no other layer of abstraction.
+* Unified APIs for implementing manual schedules, AutoTVM-style schedules, and AutoScheduler-style
+  schedules.
+* Extensibility of all the schedule primitives, including tensorization and loop partitioning.
+  Almost no extra effort is needed to use a new primitive in auto-tuning.
+* The automation infrastructure is extensible on every of its components. Every component of the
+  system can be customized easily in pure python or C++ or both. For example, one can develop a new
+  design space generator in python, a new ProgramRunner in python, etc.
+
+
+## 3. Guide-level explanation
+
+In this section, we describe the syntax of meta schedule DSL, and how it could be used to describe
+and auto-generate the design space.
+
+### 3.1. Manual Schedule
+
+Meta schedule APIs are almost the same as TE or TensorIR scheduling. Here is an example of a manual
+schedule for matrix multiplication:
+
+```python
+# Designate a set of tile sizes
+i_tiles = [16, 8, 8, 8]
+j_tiles = [16, 8, 8, 8]
+k_tiles = [256, 8]
+
+# Tile the loops according to the tile sizes
+i_0, i_1, i_2, i_3 = sch.split(loop=i, factors=i_tiles)
+j_0, j_1, j_2, j_3 = sch.split(loop=j, factors=j_tiles)
+k_0, k_1           = sch.split(loop=k, factors=k_tiles)
+
+# Organize the loops into "SSRSRS" 6-level tiles
+sch.reorder(
+    i_0, j_0, # S: the 1st spatial tile
+    i_1, j_1, # S: the 2nd spatial tile
+    k_0,      # R: the 1st reduction tile
+    i_2, j_2, # S: the 3rd spatial tile
+    k_1,      # R: the 2nd reduction tile
+    i_3, j_3, # S: the 4th spatial tile
+)
+```
+
+In this example, the developers may tweak the tile sizes and measure the performance of the
+generated kernels to explore the opportunities of potential optimization.
+
+Generally speaking, while writing a schedule, there are often some parameters that are hard to
+determine ahead of time, for example, tile sizes, unroll steps, or which tensor intrinsics to use.
+Developers may manually enumerate possible combinations of these unknown factors, and then pick the
+best schedule according to measurement results on their device.
+
+### 3.2. Composite Schedule Rules
+
+As introduced in the previous section, in TensorIR, each schedule primitive handles only a very
+basic transformation of the IR. For example, `split` only splits a loop into two new loops. In the
+real world, the over-fine granularity of those primitives usually leads to repetitive and verbose
+scheduling code, as
+[mentioned](https://discuss.tvm.apache.org/t/rfc-tensorir-a-schedulable-ir-for-tvm/7872/43?u=junrushao1994)
+by developers in our community.
+
+To make it more convenient and modular, we allow users to register "composite schedules" that apply
+a sequence of schedule primitives according to certain analysis of the IR. For instance, a composite
+schedule may inspect a TensorIR block and decide whether we should call `compute_inline` on it.
+
+### 3.3. AutoTVM-style Design Space Description
+
+Meta schedule extends the schedule DSL with sampling instructions. When included in a schedule,
+these instructions parametrize the schedule from a single deterministic point to a space supported
+by random variables (tile size, etc.), making it possible for developers to describe the design
+space with meta schedule APIs.
+
+We can extend the matmul example above to cover all possible tilings using these sampling
+instructions:
+
+```python
+# Sample tile sizes
+i_tiles = sch.sample_perfect_tile(i, n=4)
+j_tiles = sch.sample_perfect_tile(j, n=4)
+k_tiles = sch.sample_perfect_tile(k, n=2)
+# Tile the loops according to the random variables
+i_0, i_1, i_2, i_3 = sch.split(loop=i, factors=i_tiles)
+j_0, j_1, j_2, j_3 = sch.split(loop=j, factors=j_tiles)
+k_0, k_1           = sch.split(loop=k, factors=k_tiles)
+# Organize the loops into "SSRSRS" 6-level tiles
+sch.reorder(
+    i_0, j_0, # S: the 1st spatial tile
+    i_1, j_1, # S: the 2nd spatial tile
+    k_0,      # R: the 1st reduction tile
+    i_2, j_2, # S: the 3rd spatial tile
+    k_1,      # R: the 2nd reduction tile
+    i_3, j_3, # S: the 4th spatial tile
+)
+```
+
+### 3.4. AutoScheduler-style Design Space Generation
+
+AutoScheduler (Ansor) generates schedule templates by applying their SearchRules to each stage.
+SearchRule analyzes TE and eagerly trigger schedule primitives accordingly in its internally
+maintained mini IR.
+
+As introduced in Section 3.2, composite schedule rules are equivalent to AutoScheduler's SearchRule
+in TensorIR scheduling. To further generate a design space for scheduling, sampling instructions are
+used in composite schedule rules. Similarly, the sketch generation phase in AutoScheduler is
+equivalent to applying composite schedule rules to each block in TensorIR.
+
+Several built-in composite schedule rules are shipped with our system to align with Ansor's design
+space:
+
+* Multi-level tiling
+* Inline pure spatial blocks
+* Parallelize & vectorize & unroll
+* Auto tensorize
+
+Developers may implement their own rules in either Python or C++. They may specify which rules to
+use with the following syntax:
+
+```python
+from tvm import meta_schedule as ms
+
+design_space_generator = ms.PostOrderApply(rules=[
+    ms.MultiLevelTiling(...),
+    CustomRule(...),
+    ms.OtherBuiltinRules(...),
+])
+
+```
+
+### 3.5. Unifying manual schedule / AutoTVM / Ansor
+
+In this section, we show that the design space induced by TE manual schedule, AutoTVM and Ansor are
+all subsets of meta schedule, and meta schedule further allows mixing those three styles to search
+jointly.
+

Review comment:
       That makes sense. Thanks!




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tvm-rfcs] tqchen commented on a change in pull request #5: [RFC] Meta Schedule (AutoTIR)

Posted by GitBox <gi...@apache.org>.
tqchen commented on a change in pull request #5:
URL: https://github.com/apache/tvm-rfcs/pull/5#discussion_r648528792



##########
File path: rfcs/0001-meta-schedule-autotensorir.md
##########
@@ -0,0 +1,297 @@
+* Feature Name: Meta Schedule (AutoTensorIR)

Review comment:
       Please add asf license header 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tvm-rfcs] junrushao1994 commented on a change in pull request #5: [RFC] Meta Schedule (AutoTensorIR)

Posted by GitBox <gi...@apache.org>.
junrushao1994 commented on a change in pull request #5:
URL: https://github.com/apache/tvm-rfcs/pull/5#discussion_r648033968



##########
File path: rfcs/0001-meta-schedule-autotensorir.md
##########
@@ -0,0 +1,297 @@
+* Feature Name: Meta Schedule (AutoTensorIR)
+* Start Date: 2021-05-28
+* RFC PR: TBD (apache/tvm-rfcs#0000)
+* GitHub Issue: TBD (apache/tvm-rfcs#0000)
+
+## 1. Summary
+
+This proposal introduces Meta Schedule: a probabilistic scheduling DSL on TIR that unifies the approaches of AutoTVM and Auto Scheduler (Ansor). Meta schedule provides a pragmatic way to define the space of automatic tuning, extensibility in terms of all possible TIR schedule primitives like tensorization and loop partitioning, and customizability on every layer of the automation system.
+
+Meta Schedule is our 3rd generation automatic scheduling system.
+
+## 2. Motivation
+
+**Scheduling and Design Space**
+
+In TVM TensorIR, optimization of a TensorIR program is done via a sequence of transformations. For example, we reorder loops for better locality and we tensorize for specific hardware intrinsics. The process of invoking such a set of pre-defined transformations is called “**scheduling**”, and each transformation is called a “**schedule primitive**”. These primitives form a domain-specific language (DSL) describing the transformation of TensorIR programs. **Design space** is the set of all possible schedulings with respect to a TensorIR program.
+
+**Problems with the Current Scheduling System**
+
+* **Manual schedule**: Developers optimize their programs by manually invoking schedule primitives, i.e. explore points in the design space with humans in the loop. This can be a tedious and error-prone approach, hence the creation of AutoTVM and AutoScheduler (Ansor).
+* **AutoTVM**: The automation system requires users to define “schedule templates” as the design space for each operator. Therefore, it is inextensible to hundreds of operators.
+* **AutoScheduler (Ansor)**: It automatically generates schedule templates as the design space, according to a set of predefined “search rules”. However, it is non-trivial to extend AutoScheduler to new schedule primitives (tensorize, loop partition, software pipelining).
+* The three systems above have isolated sets of APIs with several layers of their own abstraction, which are not only hard to learn, but also engineering-intensive to customize.
+
+**Benefit of Meta Schedule**
+
+* Succinct syntax, consistent APIs to TensorIR schedule with no other layer of abstraction.
+* Provides unified APIs for implementing manual schedule, AutoTVM and AutoScheduler (Ansor).
+* Extensibility to all the schedule primitives, including tensorization and loop partitioning. Almost no extra effort is needed to use a new primitive in auto-tuning.
+* The automation infrastructure is customizable across every layer.
+
+## 3. Guide-level explanation
+
+In this section, we describe the syntax of meta schedule DSL, and how it could be used to describe and auto-generate the design space.
+
+### 3.1. Manual Schedule
+
+Meta schedule APIs are almost the same as TE or TensorIR scheduling. Here is an example of a manual schedule for matrix multiplication:
+
+```python
+# Designate a set of tile sizes
+i_tiles = [16, 8, 8, 8]
+j_tiles = [16, 8, 8, 8]
+k_tiles = [256, 8]
+
+# Tile the loops according to the tile sizes
+i_0, i_1, i_2, i_3 = sch.split(loop=i, factors=i_tiles)
+j_0, j_1, j_2, j_3 = sch.split(loop=j, factors=j_tiles)
+k_0, k_1           = sch.split(loop=k, factors=k_tiles)
+
+# Organize the loops into “SSRSRS” 6-level tiles
+sch.reorder(
+    i_0, j_0, # S
+    i_1, j_1, # S
+    k_0,      # R
+    i_2, j_2, # S
+    k_1,      # R
+    i_3, j_3, # S
+)
+```
+
+In this example, the developers may tweak the tile sizes and measure the performance of the generated kernels to explore the opportunities of potential optimization.
+
+Generally speaking, while writing a schedule, there are often some parameters that are hard to determine ahead of time, for example, tile sizes, unroll steps, or which tensor intrinsics to use. Developers may manually enumerate possible combinations of these unknown factors, and then pick the best schedule according to measurement results on their device.
+
+### 3.2. AutoTVM-style Design Space Description
+
+Meta schedule extends the schedule DSL with sampling instructions. When included in a schedule, these instructions parametrize the schedule from a single deterministic point to a space supported by random variables (tile size, etc.), making it possible for developers to describe the design space with meta schedule APIs.
+
+We can extend the matmul example above to cover all possible tilings using these sampling instructions:
+
+```python
+# Sample tile sizes
+i_tiles = sch.sample_perfect_tile(i, n=4)
+j_tiles = sch.sample_perfect_tile(j, n=4)
+k_tiles = sch.sample_perfect_tile(k, n=2)
+# Tile the loops according to the random variables
+i_0, i_1, i_2, i_3 = sch.split(loop=i, factors=i_tiles)
+j_0, j_1, j_2, j_3 = sch.split(loop=j, factors=j_tiles)
+k_0, k_1           = sch.split(loop=k, factors=k_tiles)
+# Organize the loops into “SSRSRS” 6-level tiles
+sch.reorder(
+    i_0, j_0, # S
+    i_1, j_1, # S
+    k_0,      # R
+    i_2, j_2, # S
+    k_1,      # R
+    i_3, j_3, # S
+)
+```
+
+### 3.3. Composite Schedule
+
+Each schedule primitive handles only a very basic operation to transform the IR, for example, `split` only splits a loop into two. In the real world, the over-fine granularity of those primitives usually leads to repetitive and verbose scheduling code, as [mentioned](https://discuss.tvm.apache.org/t/rfc-tensorir-a-schedulable-ir-for-tvm/7872/43?u=junrushao1994) by developers in our community.
+
+To counter this challenge, we allow users to register “composite schedules” that analyze the IR, and apply a set of schedule primitives correspondingly. For instance, a composite schedule may inspect a TensorIR block and decide whether we should call `compute_inline` on it. The composite schedule may use sampling instructions to fill in undecided choices.
+
+Our system also ships with some built-in composite schedules, including:
+
+* Multi-level tiling
+* Inline pure spatial blocks
+* Parallelize & vectorize & unroll
+* Auto tensorize
+* …
+
+### 3.4. AutoScheduler-style Design Space Generation
+
+AutoScheduler (Ansor) generates schedule templates by applying their SearchRules to each stage. Meta schedule treats a search rule as a composite schedule, and applies each composite schedule to each block of TensorIR to generate the design space.
+
+### 3.5. Unifying manual schedule / AutoTVM / Ansor
+
+In this section, we show that the design space induced by TE manual schedule, AutoTVM and Ansor are all subsets of meta schedule, and meta schedule further allows mixing those three styles to search jointly.
+
+**Manual schedule**. The TE schedule is a special case of a meta schedule program, where there is no randomness introduced by sampling instructions. It is a single point in terms of design space.
+
+**AutoTVM (Template-based tuning)**. Writing one or more schedule functions in meta schedule, potentially with sampling instructions, is a natural representation of AutoTVM’s schedule templates (knobs). The PPL generates one or more traces as the design space to explore.
+
+**AutoScheduler (Ansor, Template-free tuning)**. As mentioned in the previous section, application of composite schedule rules generates the design space, which is equivalent to Ansor’s sketch generation.
+
+**Mixing styles in design space definition**. By taking union of the spaces induced by the three special cases, our system allows developers to combine generic rules that Ansor provides and operator-specific scheduling.
+
+## 4. Reference-level explanation
+
+In this section, we introduce the underlying techniques for the automation system to extract and explore the design space.
+
+### 4.1. Execution trace as the design space
+
+**Trace**. To represent the design space defined by the meta schedule DSL, the underlying system records all the instructions users applied to the schedule class, including sampling and schedule primitives. We call this list of instructions a trace.
+
+Executing the example above results in the following trace:
+
+```
+Instruction 0. Sample-Perfect-Tile
+Instruction 1. Sample-Perfect-Tile
+Instruction 2. Sample-Perfect-Tile
+Instruction 3. Split
+Instruction 4. Split
+Instruction 5. Split
+Instruction 6. Reorder
+```
+
+The trace is not directly user-facing, but a data structure inside the user-facing `Schedule` class that records the execution. The automation system extracts the trace and finds out the design space according to the sampling instructions.
+
+**Union of traces**. Often a single trace is unable to represent the entire space. Therefore, more precisely, our system works on a list of traces as the union of potential design space.
+
+**Fork a trace**. When two different decisions in the scheduling process are equally important to generate high-performance schedules, we allow forking the trace into two, and the design space is the union of the forked traces.
+
+### 4.2. Exploring the Search Space
+
+Meta Schedule provides several built-in exploration strategies to exhaustively or efficiently search for efficient schedules.
+
+**Program replay**. A simple strategy that replays the schedule program that generates the PPL, and doesn’t use any advantage provided by the PPL.

Review comment:
       Oh I forgot to remove the PPL term in this section. I mentioned it in previous drafts but later decided not to go this deep. Thanks for the reminder!
   
   Replaying this DSL (without remembering previous decisions) is definitely a way to explore the design space, relying on the fact that random choices may differ between each run.
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tvm-rfcs] junrushao1994 commented on pull request #5: [RFC] Meta Schedule (AutoTIR)

Posted by GitBox <gi...@apache.org>.
junrushao1994 commented on pull request #5:
URL: https://github.com/apache/tvm-rfcs/pull/5#issuecomment-887082344


   @areusch @tqchen Thanks for your time reviewing the RFC! Just addressed all the comments and please take another look when you guys got time :-)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tvm-rfcs] junrushao1994 commented on a change in pull request #5: [RFC] Meta Schedule (AutoTIR)

Posted by GitBox <gi...@apache.org>.
junrushao1994 commented on a change in pull request #5:
URL: https://github.com/apache/tvm-rfcs/pull/5#discussion_r668328703



##########
File path: rfcs/0001-meta-schedule-autotensorir.md
##########
@@ -0,0 +1,444 @@
+<!--- Licensed to the Apache Software Foundation (ASF) under one -->
+<!--- or more contributor license agreements.  See the NOTICE file -->
+<!--- distributed with this work for additional information -->
+<!--- regarding copyright ownership.  The ASF licenses this file -->
+<!--- to you under the Apache License, Version 2.0 (the -->
+<!--- "License"); you may not use this file except in compliance -->
+<!--- with the License.  You may obtain a copy of the License at -->
+
+<!---   http://www.apache.org/licenses/LICENSE-2.0 -->
+
+<!--- Unless required by applicable law or agreed to in writing, -->
+<!--- software distributed under the License is distributed on an -->
+<!--- "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY -->
+<!--- KIND, either express or implied.  See the License for the -->
+<!--- specific language governing permissions and limitations -->
+<!--- under the License. -->
+
+* Feature Name: Meta Schedule (Formerly AutoTIR)
+* Start Date: 2021-05-28
+* RFC PR: TBD (apache/tvm-rfcs#0000)
+* GitHub Issue: TBD (apache/tvm-rfcs#0000)
+
+## 1. Summary
+
+This proposal introduces Meta Schedule: a probabilistic scheduling DSL on TIR that unifies the
+approaches of AutoTVM and Auto Scheduler (Ansor). Meta schedule provides a pragmatic way to define
+the space of automatic tuning, extensibility in terms of all possible TIR schedule primitives like
+tensorization and loop partitioning, and customizability on every layer of the automation system.
+
+Meta Schedule is our 3rd generation automatic scheduling system.
+
+## 2. Motivation
+
+**Scheduling and design space.** In TVM TensorIR, optimization of a TensorIR program is done via a
+sequence of transformations. For example, we reorder loops for better locality and we tensorize for
+specific hardware intrinsics. The process of invoking such a set of pre-defined transformations is
+called "**scheduling**", and each transformation is called a "**schedule primitive**". These
+primitives form a domain-specific language (DSL) describing the transformation of TensorIR programs.
+**Design space** is the set of all possible schedulings with respect to a TensorIR program.
+
+**Problems with the current scheduling system.** Currently we have 3 sets of scheduling APIs:
+* **Manual schedule**: Developers optimize their programs by manually invoking schedule primitives,
+  i.e. explore points in the design space with humans in the loop. This can be a tedious and
+  error-prone approach, hence the creation of AutoTVM and AutoScheduler (Ansor).
+* **AutoTVM**: The automation system requires users to define "schedule templates" as the design
+  space for each operator. Therefore, it is inextensible to hundreds of operators and variety
+  hardware platforms.
+* **AutoScheduler (Ansor)**: It automatically generates schedule templates as the design space,
+  according to a set of predefined "search rules". However, it is non-trivial to extend
+  AutoScheduler to new schedule primitives (tensorize, loop partition, software pipelining, etc).
+* The three systems above have isolated sets of APIs with several layers of their own abstraction,
+  which are not only hard to learn, but also engineering-intensive to customize.
+
+**Benefits of Meta Schedule.**  Meta schedule provides:
+* Succinct syntax, consistent APIs to TensorIR schedule with no other layer of abstraction.
+* Unified APIs for implementing manual schedules, AutoTVM-style schedules, and AutoScheduler-style
+  schedules.
+* Extensibility of all the schedule primitives, including tensorization and loop partitioning.
+  Almost no extra effort is needed to use a new primitive in auto-tuning.
+* The automation infrastructure is extensible on every of its components. Every component of the
+  system can be customized easily in pure python or C++ or both. For example, one can develop a new
+  design space generator in python, a new ProgramRunner in python, etc.
+
+
+## 3. Guide-level explanation
+
+In this section, we describe the syntax of meta schedule DSL, and how it could be used to describe
+and auto-generate the design space.
+
+### 3.1. Manual Schedule
+
+Meta schedule APIs are almost the same as TE or TensorIR scheduling. Here is an example of a manual
+schedule for matrix multiplication:
+
+```python
+# Designate a set of tile sizes
+i_tiles = [16, 8, 8, 8]
+j_tiles = [16, 8, 8, 8]
+k_tiles = [256, 8]
+
+# Tile the loops according to the tile sizes
+i_0, i_1, i_2, i_3 = sch.split(loop=i, factors=i_tiles)
+j_0, j_1, j_2, j_3 = sch.split(loop=j, factors=j_tiles)
+k_0, k_1           = sch.split(loop=k, factors=k_tiles)
+
+# Organize the loops into "SSRSRS" 6-level tiles
+sch.reorder(
+    i_0, j_0, # S: the 1st spatial tile
+    i_1, j_1, # S: the 2nd spatial tile
+    k_0,      # R: the 1st reduction tile
+    i_2, j_2, # S: the 3rd spatial tile
+    k_1,      # R: the 2nd reduction tile
+    i_3, j_3, # S: the 4th spatial tile
+)
+```
+
+In this example, the developers may tweak the tile sizes and measure the performance of the
+generated kernels to explore the opportunities of potential optimization.
+
+Generally speaking, while writing a schedule, there are often some parameters that are hard to
+determine ahead of time, for example, tile sizes, unroll steps, or which tensor intrinsics to use.
+Developers may manually enumerate possible combinations of these unknown factors, and then pick the
+best schedule according to measurement results on their device.
+
+### 3.2. Composite Schedule Rules
+
+As introduced in the previous section, in TensorIR, each schedule primitive handles only a very
+basic transformation of the IR. For example, `split` only splits a loop into two new loops. In the
+real world, the over-fine granularity of those primitives usually leads to repetitive and verbose
+scheduling code, as
+[mentioned](https://discuss.tvm.apache.org/t/rfc-tensorir-a-schedulable-ir-for-tvm/7872/43?u=junrushao1994)
+by developers in our community.
+
+To make it more convenient and modular, we allow users to register "composite schedules" that apply
+a sequence of schedule primitives according to certain analysis of the IR. For instance, a composite
+schedule may inspect a TensorIR block and decide whether we should call `compute_inline` on it.
+
+### 3.3. AutoTVM-style Design Space Description
+
+Meta schedule extends the schedule DSL with sampling instructions. When included in a schedule,
+these instructions parametrize the schedule from a single deterministic point to a space supported
+by random variables (tile size, etc.), making it possible for developers to describe the design
+space with meta schedule APIs.
+
+We can extend the matmul example above to cover all possible tilings using these sampling
+instructions:
+
+```python
+# Sample tile sizes
+i_tiles = sch.sample_perfect_tile(i, n=4)
+j_tiles = sch.sample_perfect_tile(j, n=4)
+k_tiles = sch.sample_perfect_tile(k, n=2)
+# Tile the loops according to the random variables
+i_0, i_1, i_2, i_3 = sch.split(loop=i, factors=i_tiles)
+j_0, j_1, j_2, j_3 = sch.split(loop=j, factors=j_tiles)
+k_0, k_1           = sch.split(loop=k, factors=k_tiles)
+# Organize the loops into "SSRSRS" 6-level tiles
+sch.reorder(
+    i_0, j_0, # S: the 1st spatial tile
+    i_1, j_1, # S: the 2nd spatial tile
+    k_0,      # R: the 1st reduction tile
+    i_2, j_2, # S: the 3rd spatial tile
+    k_1,      # R: the 2nd reduction tile
+    i_3, j_3, # S: the 4th spatial tile
+)
+```
+
+### 3.4. AutoScheduler-style Design Space Generation
+
+AutoScheduler (Ansor) generates schedule templates by applying their SearchRules to each stage.
+SearchRule analyzes TE and eagerly trigger schedule primitives accordingly in its internally
+maintained mini IR.
+
+As introduced in Section 3.2, composite schedule rules are equivalent to AutoScheduler's SearchRule
+in TensorIR scheduling. To further generate a design space for scheduling, sampling instructions are
+used in composite schedule rules. Similarly, the sketch generation phase in AutoScheduler is
+equivalent to applying composite schedule rules to each block in TensorIR.
+
+Several built-in composite schedule rules are shipped with our system to align with Ansor's design
+space:
+
+* Multi-level tiling
+* Inline pure spatial blocks
+* Parallelize & vectorize & unroll
+* Auto tensorize
+
+Developers may implement their own rules in either Python or C++. They may specify which rules to
+use with the following syntax:
+
+```python
+from tvm import meta_schedule as ms
+
+design_space_generator = ms.PostOrderApply(rules=[
+    ms.MultiLevelTiling(...),
+    CustomRule(...),
+    ms.OtherBuiltinRules(...),
+])
+
+```
+
+### 3.5. Unifying manual schedule / AutoTVM / Ansor
+
+In this section, we show that the design space induced by TE manual schedule, AutoTVM and Ansor are
+all subsets of meta schedule, and meta schedule further allows mixing those three styles to search
+jointly.
+
+**Manual schedule**. The TE schedule is a special case of a meta schedule program, where there is no
+randomness introduced by sampling instructions. It is a single point in terms of design space.
+
+**AutoTVM (Template-based tuning)**. It is more natural representation of AutoTVM’s schedule
+templates (knobs) by writing one or more schedule functions in meta schedule with sampling
+instructions. The execution traces generated by the schedule functions are the design space to be

Review comment:
       Oh the concept is not introduced until the following section. Going to change the description here to "the probability space supported by the sampling instructions"




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tvm-rfcs] junrushao1994 commented on a change in pull request #5: [RFC] Meta Schedule (AutoTIR)

Posted by GitBox <gi...@apache.org>.
junrushao1994 commented on a change in pull request #5:
URL: https://github.com/apache/tvm-rfcs/pull/5#discussion_r677650502



##########
File path: rfcs/0001-meta-schedule-autotensorir.md
##########
@@ -0,0 +1,604 @@
+<!--- Licensed to the Apache Software Foundation (ASF) under one -->
+<!--- or more contributor license agreements.  See the NOTICE file -->
+<!--- distributed with this work for additional information -->
+<!--- regarding copyright ownership.  The ASF licenses this file -->
+<!--- to you under the Apache License, Version 2.0 (the -->
+<!--- "License"); you may not use this file except in compliance -->
+<!--- with the License.  You may obtain a copy of the License at -->
+
+<!---   http://www.apache.org/licenses/LICENSE-2.0 -->
+
+<!--- Unless required by applicable law or agreed to in writing, -->
+<!--- software distributed under the License is distributed on an -->
+<!--- "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY -->
+<!--- KIND, either express or implied.  See the License for the -->
+<!--- specific language governing permissions and limitations -->
+<!--- under the License. -->
+
+* Feature Name: Meta Schedule (Formerly AutoTIR)
+* Start Date: 2021-05-28
+* RFC PR: https://github.com/apache/tvm-rfcs/pull/5/
+* GitHub Issue: https://github.com/apache/tvm/issues/8473
+
+## 1. Summary
+
+This proposal introduces Meta Schedule: a scheduling DSL on TIR that unifies the
+approaches of AutoTVM [1] and AutoScheduler [2]. Meta schedule provides a pragmatic way to define
+the space of automatic tuning, extensibility in terms of all possible TIR schedule primitives like
+tensorization and loop partitioning, and customizability on every layer of the automation system.
+
+Meta Schedule is the 3rd generation automatic scheduling system.
+
+## 2. Motivation
+
+**Scheduling.** In TVM,
+both TensorIR and TE allow direct or indirect IR transformation guided by a

Review comment:
       Thanks Andrew! I can send a follow-up PR!




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tvm-rfcs] junrushao1994 commented on a change in pull request #5: [RFC] Meta Schedule (AutoTIR)

Posted by GitBox <gi...@apache.org>.
junrushao1994 commented on a change in pull request #5:
URL: https://github.com/apache/tvm-rfcs/pull/5#discussion_r664878703



##########
File path: rfcs/0001-meta-schedule-autotensorir.md
##########
@@ -0,0 +1,339 @@
+<!--- Licensed to the Apache Software Foundation (ASF) under one -->
+<!--- or more contributor license agreements.  See the NOTICE file -->
+<!--- distributed with this work for additional information -->
+<!--- regarding copyright ownership.  The ASF licenses this file -->
+<!--- to you under the Apache License, Version 2.0 (the -->
+<!--- "License"); you may not use this file except in compliance -->
+<!--- with the License.  You may obtain a copy of the License at -->
+
+<!---   http://www.apache.org/licenses/LICENSE-2.0 -->
+
+<!--- Unless required by applicable law or agreed to in writing, -->
+<!--- software distributed under the License is distributed on an -->
+<!--- "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY -->
+<!--- KIND, either express or implied.  See the License for the -->
+<!--- specific language governing permissions and limitations -->
+<!--- under the License. -->
+
+* Feature Name: Meta Schedule (AutoTIR)
+* Start Date: 2021-05-28
+* RFC PR: TBD (apache/tvm-rfcs#0000)
+* GitHub Issue: TBD (apache/tvm-rfcs#0000)
+
+## 1. Summary
+
+This proposal introduces Meta Schedule: a probabilistic scheduling DSL on TIR that unifies the approaches of AutoTVM and Auto Scheduler (Ansor). Meta schedule provides a pragmatic way to define the space of automatic tuning, extensibility in terms of all possible TIR schedule primitives like tensorization and loop partitioning, and customizability on every layer of the automation system.
+
+Meta Schedule is our 3rd generation automatic scheduling system.
+
+## 2. Motivation
+
+**Scheduling and Design Space**
+
+In TVM TensorIR, optimization of a TensorIR program is done via a sequence of transformations. For example, we reorder loops for better locality and we tensorize for specific hardware intrinsics. The process of invoking such a set of pre-defined transformations is called "**scheduling**", and each transformation is called a "**schedule primitive**". These primitives form a domain-specific language (DSL) describing the transformation of TensorIR programs. **Design space** is the set of all possible schedulings with respect to a TensorIR program.
+
+**Problems with the Current Scheduling System**
+
+* **Manual schedule**: Developers optimize their programs by manually invoking schedule primitives, i.e. explore points in the design space with humans in the loop. This can be a tedious and error-prone approach, hence the creation of AutoTVM and AutoScheduler (Ansor).
+* **AutoTVM**: The automation system requires users to define "schedule templates" as the design space for each operator. Therefore, it is inextensible to hundreds of operators and variety hardware platforms.
+* **AutoScheduler (Ansor)**: It automatically generates schedule templates as the design space, according to a set of predefined "search rules". However, it is non-trivial to extend AutoScheduler to new schedule primitives (tensorize, loop partition, software pipelining, etc).
+* The three systems above have isolated sets of APIs with several layers of their own abstraction, which are not only hard to learn, but also engineering-intensive to customize.
+
+**Benefit of Meta Schedule**
+
+* Succinct syntax, consistent APIs to TensorIR schedule with no other layer of abstraction.
+* Provides unified APIs for implementing manual schedule, AutoTVM and AutoScheduler (Ansor).
+* Extensibility to all the schedule primitives, including tensorization and loop partitioning. Almost no extra effort is needed to use a new primitive in auto-tuning.
+* The automation infrastructure is extensible on every of its components. Every component of the system can be customized easily in pure python or C++ or both. For example, one can develop a new design space generator in python, a new ProgramRunner in python, etc.
+
+
+## 3. Guide-level explanation
+
+In this section, we describe the syntax of meta schedule DSL, and how it could be used to describe and auto-generate the design space.
+
+### 3.1. Manual Schedule
+
+Meta schedule APIs are almost the same as TE or TensorIR scheduling. Here is an example of a manual schedule for matrix multiplication:
+
+```python
+# Designate a set of tile sizes
+i_tiles = [16, 8, 8, 8]
+j_tiles = [16, 8, 8, 8]
+k_tiles = [256, 8]
+
+# Tile the loops according to the tile sizes
+i_0, i_1, i_2, i_3 = sch.split(loop=i, factors=i_tiles)
+j_0, j_1, j_2, j_3 = sch.split(loop=j, factors=j_tiles)
+k_0, k_1           = sch.split(loop=k, factors=k_tiles)
+
+# Organize the loops into "SSRSRS" 6-level tiles
+sch.reorder(
+    i_0, j_0, # S: the 1st spatial tile
+    i_1, j_1, # S: the 2nd spatial tile
+    k_0,      # R: the 1st reduction tile
+    i_2, j_2, # S: the 3rd spatial tile
+    k_1,      # R: the 2nd reduction tile
+    i_3, j_3, # S: the 4th spatial tile
+)
+```
+
+In this example, the developers may tweak the tile sizes and measure the performance of the generated kernels to explore the opportunities of potential optimization.
+
+Generally speaking, while writing a schedule, there are often some parameters that are hard to determine ahead of time, for example, tile sizes, unroll steps, or which tensor intrinsics to use. Developers may manually enumerate possible combinations of these unknown factors, and then pick the best schedule according to measurement results on their device.
+
+### 3.2. Composite Schedule Rules
+
+As introduced in the previous section, in TensorIR, each schedule primitive handles only a very basic transformation of the IR, for example, `split` only splits a loop into two. In the real world, the over-fine granularity of those primitives usually leads to repetitive and verbose scheduling code, as [mentioned](https://discuss.tvm.apache.org/t/rfc-tensorir-a-schedulable-ir-for-tvm/7872/43?u=junrushao1994) by developers in our community.
+
+To make it more convenient and modular, we allow users to register "composite schedules" that apply a sequence of schedule primitives according to certain analysis of the IR. For instance, a composite schedule may inspect a TensorIR block and decide whether we should call `compute_inline` on it.
+
+### 3.3. AutoTVM-style Design Space Description
+
+Meta schedule extends the schedule DSL with sampling instructions. When included in a schedule, these instructions parametrize the schedule from a single deterministic point to a space supported by random variables (tile size, etc.), making it possible for developers to describe the design space with meta schedule APIs.
+
+We can extend the matmul example above to cover all possible tilings using these sampling instructions:
+
+```python
+# Sample tile sizes
+i_tiles = sch.sample_perfect_tile(i, n=4)
+j_tiles = sch.sample_perfect_tile(j, n=4)
+k_tiles = sch.sample_perfect_tile(k, n=2)
+# Tile the loops according to the random variables
+i_0, i_1, i_2, i_3 = sch.split(loop=i, factors=i_tiles)
+j_0, j_1, j_2, j_3 = sch.split(loop=j, factors=j_tiles)
+k_0, k_1           = sch.split(loop=k, factors=k_tiles)
+# Organize the loops into "SSRSRS" 6-level tiles
+sch.reorder(
+    i_0, j_0, # S: the 1st spatial tile
+    i_1, j_1, # S: the 2nd spatial tile
+    k_0,      # R: the 1st reduction tile
+    i_2, j_2, # S: the 3rd spatial tile
+    k_1,      # R: the 2nd reduction tile
+    i_3, j_3, # S: the 4th spatial tile
+)
+```
+
+### 3.4. AutoScheduler-style Design Space Generation
+
+AutoScheduler (Ansor) generates schedule templates by applying their SearchRules to each stage. SearchRule analyzes TE and eagerly trigger schedule primitives accordingly in its internally maintained mini IR.
+
+As introduced in Section 3.2, composite schedule rules are equivalent to AutoScheduler's SearchRule in TensorIR scheduling. To further generate a design space for scheduling, sampling instructions are used in composite schedule rules. Similarly, the sketch generation phase in AutoScheduler is equivalent to applying composite schedule rules to each block in TensorIR.
+
+Several built-in composite schedule rules are shipped with our system to align with Ansor's design space:
+
+* Multi-level tiling
+* Inline pure spatial blocks
+* Parallelize & vectorize & unroll
+* Auto tensorize
+* ...
+
+Developers may implement their own rules in either Python or C++, and specify which rules to use with the following syntax:
+
+```python
+from tvm import meta_schedule as ms
+
+design_space_generator = ms.PostOrderApply(rules=[
+    ms.MultiLevelTiling(...),
+    CustomRule(...),
+    ms.OtherBuiltinRules(...),
+])
+
+```
+
+### 3.5. Unifying manual schedule / AutoTVM / Ansor
+
+In this section, we show that the design space induced by TE manual schedule, AutoTVM and Ansor are all subsets of meta schedule, and meta schedule further allows mixing those three styles to search jointly.
+
+**Manual schedule**. The TE schedule is a special case of a meta schedule program, where there is no randomness introduced by sampling instructions. It is a single point in terms of design space.
+
+**AutoTVM (Template-based tuning)**. It is more natural representation of AutoTVM’s schedule templates (knobs) by writing one or more schedule functions in meta schedule with sampling instructions. The execution traces generated by the schedule functions are the design space to be explored.
+
+**AutoScheduler (Ansor, Template-free tuning)**. As mentioned in the previous section, application of composite schedule rules generates the design space, which is equivalent to Ansor’s sketch generation.
+
+**Mixing styles in design space definition**. By taking union of the spaces induced by the three special cases, our system allows developers to combine generic rules that Ansor provides and operator-specific scheduling.
+
+## 4. Reference-level explanation
+
+In this section, we introduce the underlying techniques for the automation system to extract and explore the design space. The figure below briefly illustrates the workflow of the system:
+
+![meta-schedule-workflow](../resources/meta-schedule-workflow.png)
+
+### 4.1. Execution trace as the design space
+
+**Trace**. To represent the design space defined by the meta schedule DSL, the underlying system records all the instructions users applied to the schedule class, including sampling and schedule primitives. We call this list of instructions a trace.
+
+Executing the example above results in the following trace:
+
+```
+Instruction 0. Sample-Perfect-Tile
+Instruction 1. Sample-Perfect-Tile
+Instruction 2. Sample-Perfect-Tile
+Instruction 3. Split
+Instruction 4. Split
+Instruction 5. Split
+Instruction 6. Reorder
+```
+
+**Trace forms design space.** A trace may contain zero or more sampling instructions, which introduces the uncertainty in scheduling - one instance of sampling results in one point in the design space. Therefore, the trace itself forms a design space to explore, e.g. which set of tile sizes works best on a specific hardware.
+
+**Union of design space**. Our system works on a set of traces, representing the union of the design spaces represented by every single trace.
+
+**Fork a trace**. When two different decisions in the scheduling process are equally important to generate high-performance schedules, we allow forking the trace into two, and the design space is the union of the forked traces.
+
+The trace is not strictly user-facing, but can be accessed and printed with the following syntax:
+
+```python
+# requires to trace the execution
+sch = tir.Schedule(..., traced=True)
+# do a lot of scheduling
+...
+# print the trace
+print(sch.trace)
+```
+
+And below is an example of the printed trace, which honestly reflects the schedule as a snippet of python scheduling function:
+
+```python
+b0 = sch.get_block(name="matmul", func_name="main")
+l1, l2, l3 = sch.get_loops(block=b0)
+v4, v5, v6, v7 = sch.sample_perfect_tile(loop=l1, n=4, max_innermost_factor=16, decision=[32, 1, 16, 2])
+v8, v9, v10, v11 = sch.sample_perfect_tile(loop=l2, n=4, max_innermost_factor=16, decision=[64, 4, 2, 2])
+v12, v13 = sch.sample_perfect_tile(loop=l3, n=2, max_innermost_factor=16, decision=[64, 16])
+l14, l15, l16, l17 = sch.split(loop=l1, factors=[v4, v5, v6, v7])
+l18, l19, l20, l21 = sch.split(loop=l2, factors=[v8, v9, v10, v11])
+l22, l23 = sch.split(loop=l3, factors=[v12, v13])
+sch.reorder(l14, l18, l15, l19, l22, l16, l20, l23, l17, l21)
+```
+
+### 4.2. Exploring the Search Space
+
+Meta Schedule provides several built-in exploration strategies to exhaustively or efficiently search for efficient schedules.
+
+**Program replay**. A simple strategy that replays the user-provided schedule function without using or taking any advantage of trace.

Review comment:
       This paragraph was not well written to express the difference between "replaying an opaque schedule function" and "replaying a trace". I am going to rephrase a little bit




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tvm-rfcs] junrushao1994 commented on a change in pull request #5: [RFC] Meta Schedule (AutoTIR)

Posted by GitBox <gi...@apache.org>.
junrushao1994 commented on a change in pull request #5:
URL: https://github.com/apache/tvm-rfcs/pull/5#discussion_r676879435



##########
File path: rfcs/0001-meta-schedule-autotensorir.md
##########
@@ -266,55 +322,83 @@ sch.reorder(l14, l18, l15, l19, l22, l16, l20, l23, l17, l21)
 ### 4.2. Exploring the Design Space
 
 Meta Schedule provides several built-in exploration strategies to exhaustively or efficiently search
-for efficient schedules.
-
-**Random search by replaying schedule functions.** With a user-provided schedule function
-as a black-box design space generator, our system could repetitively invoke such an opaque function
-without doing any extra analysis. The function could be written in C++ or Python, or any language
-that implements packed function FFI. If sampling instructions are present in the function, then each
-invocation results in a different IRModule after being scheduled because the random decisions are
-possibly changed across different runs. Effectively, it is equivalent to
-random exploration without trace, allowing the flexibility for users to define arbitrary functions
+for efficient schedules. Those strategies are mostly supported by re-execute either a function or a
+trace with a builtin interpreter in meta schedule, and this process is called **replay**.
+
+#### Random search by replaying schedule functions
+
+With a user-provided schedule function
+as a black-box design space generator, meta schedule could repetitively invokes such an opaque TVM
+packed function without doing any extra analysis.
+If sampling instructions are present in the trace, then scheduling is non-deterministic
+(random decisions may not be repeated across runs)

Review comment:
       Yeah I should clarify more straightforward: a trace with decisions strictly guarantees reproducibility.
   
   In detail, a trace is defined as:
   
   ```python
   class Trace:
     instructions: List[Instruction]
     decisions: Dict[Instruction, Any]
   ```
   
   For each sampling instruction in the trace, if it has a corresponding entry in the decisions dict, then the output is uniquely determined by the decision, hence reproducibility is guaranteed (Example 1); If a corresponding entry is not presented, then randomness will be introduced by interpreting the trace (Example 2).
   
   ```python
   # Example 1. Trace with deterministic result
   l1, l2 = sch.sample_perfect_tile(loop, n=2, decisions=[4, 32])  # Deterministic l1 = 4, l2 = 32
   # Example 2. Trace with randomized result
   l1, l2 = sch.sample_perfect_tile(loop, n=2)  # l1 and l2 are random
   ```
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tvm-rfcs] junrushao1994 commented on a change in pull request #5: [RFC] Meta Schedule (AutoTIR)

Posted by GitBox <gi...@apache.org>.
junrushao1994 commented on a change in pull request #5:
URL: https://github.com/apache/tvm-rfcs/pull/5#discussion_r676834667



##########
File path: rfcs/0001-meta-schedule-autotensorir.md
##########
@@ -147,17 +203,20 @@ sch.reorder(
 
 ### 3.4. AutoScheduler-style Design Space Generation
 
-AutoScheduler (Ansor) generates schedule templates by applying their SearchRules to each stage.
-SearchRule analyzes TE and eagerly trigger schedule primitives accordingly in its internally
-maintained mini IR.
+To generate design space, AutoScheduler (Ansor) applies a set of rules to each TE stage.
+The rules analyze the TE operations and apply an internal DSL to manipulating its internal IR,

Review comment:
       Both are internal, will clarify




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tvm-rfcs] junrushao1994 commented on a change in pull request #5: [RFC] Meta Schedule (AutoTIR)

Posted by GitBox <gi...@apache.org>.
junrushao1994 commented on a change in pull request #5:
URL: https://github.com/apache/tvm-rfcs/pull/5#discussion_r664877290



##########
File path: rfcs/0001-meta-schedule-autotensorir.md
##########
@@ -0,0 +1,339 @@
+<!--- Licensed to the Apache Software Foundation (ASF) under one -->
+<!--- or more contributor license agreements.  See the NOTICE file -->
+<!--- distributed with this work for additional information -->
+<!--- regarding copyright ownership.  The ASF licenses this file -->
+<!--- to you under the Apache License, Version 2.0 (the -->
+<!--- "License"); you may not use this file except in compliance -->
+<!--- with the License.  You may obtain a copy of the License at -->
+
+<!---   http://www.apache.org/licenses/LICENSE-2.0 -->
+
+<!--- Unless required by applicable law or agreed to in writing, -->
+<!--- software distributed under the License is distributed on an -->
+<!--- "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY -->
+<!--- KIND, either express or implied.  See the License for the -->
+<!--- specific language governing permissions and limitations -->
+<!--- under the License. -->
+
+* Feature Name: Meta Schedule (AutoTIR)

Review comment:
       I thought for a bit, hmmm, yeah it might be the way to say "okay let's stick with the name meta schedule instead"

##########
File path: rfcs/0001-meta-schedule-autotensorir.md
##########
@@ -0,0 +1,339 @@
+<!--- Licensed to the Apache Software Foundation (ASF) under one -->
+<!--- or more contributor license agreements.  See the NOTICE file -->
+<!--- distributed with this work for additional information -->
+<!--- regarding copyright ownership.  The ASF licenses this file -->
+<!--- to you under the Apache License, Version 2.0 (the -->
+<!--- "License"); you may not use this file except in compliance -->
+<!--- with the License.  You may obtain a copy of the License at -->
+
+<!---   http://www.apache.org/licenses/LICENSE-2.0 -->
+
+<!--- Unless required by applicable law or agreed to in writing, -->
+<!--- software distributed under the License is distributed on an -->
+<!--- "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY -->
+<!--- KIND, either express or implied.  See the License for the -->
+<!--- specific language governing permissions and limitations -->
+<!--- under the License. -->
+
+* Feature Name: Meta Schedule (AutoTIR)

Review comment:
       thanks for the suggestion Tristan!




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tvm-rfcs] junrushao1994 commented on a change in pull request #5: [RFC] Meta Schedule (AutoTIR)

Posted by GitBox <gi...@apache.org>.
junrushao1994 commented on a change in pull request #5:
URL: https://github.com/apache/tvm-rfcs/pull/5#discussion_r668296259



##########
File path: rfcs/0001-meta-schedule-autotensorir.md
##########
@@ -0,0 +1,444 @@
+<!--- Licensed to the Apache Software Foundation (ASF) under one -->
+<!--- or more contributor license agreements.  See the NOTICE file -->
+<!--- distributed with this work for additional information -->
+<!--- regarding copyright ownership.  The ASF licenses this file -->
+<!--- to you under the Apache License, Version 2.0 (the -->
+<!--- "License"); you may not use this file except in compliance -->
+<!--- with the License.  You may obtain a copy of the License at -->
+
+<!---   http://www.apache.org/licenses/LICENSE-2.0 -->
+
+<!--- Unless required by applicable law or agreed to in writing, -->
+<!--- software distributed under the License is distributed on an -->
+<!--- "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY -->
+<!--- KIND, either express or implied.  See the License for the -->
+<!--- specific language governing permissions and limitations -->
+<!--- under the License. -->
+
+* Feature Name: Meta Schedule (Formerly AutoTIR)
+* Start Date: 2021-05-28
+* RFC PR: TBD (apache/tvm-rfcs#0000)
+* GitHub Issue: TBD (apache/tvm-rfcs#0000)
+
+## 1. Summary
+
+This proposal introduces Meta Schedule: a probabilistic scheduling DSL on TIR that unifies the
+approaches of AutoTVM and Auto Scheduler (Ansor). Meta schedule provides a pragmatic way to define
+the space of automatic tuning, extensibility in terms of all possible TIR schedule primitives like
+tensorization and loop partitioning, and customizability on every layer of the automation system.
+
+Meta Schedule is our 3rd generation automatic scheduling system.
+
+## 2. Motivation
+
+**Scheduling and design space.** In TVM TensorIR, optimization of a TensorIR program is done via a
+sequence of transformations. For example, we reorder loops for better locality and we tensorize for
+specific hardware intrinsics. The process of invoking such a set of pre-defined transformations is
+called "**scheduling**", and each transformation is called a "**schedule primitive**". These
+primitives form a domain-specific language (DSL) describing the transformation of TensorIR programs.
+**Design space** is the set of all possible schedulings with respect to a TensorIR program.
+
+**Problems with the current scheduling system.** Currently we have 3 sets of scheduling APIs:
+* **Manual schedule**: Developers optimize their programs by manually invoking schedule primitives,
+  i.e. explore points in the design space with humans in the loop. This can be a tedious and
+  error-prone approach, hence the creation of AutoTVM and AutoScheduler (Ansor).
+* **AutoTVM**: The automation system requires users to define "schedule templates" as the design
+  space for each operator. Therefore, it is inextensible to hundreds of operators and variety
+  hardware platforms.
+* **AutoScheduler (Ansor)**: It automatically generates schedule templates as the design space,
+  according to a set of predefined "search rules". However, it is non-trivial to extend
+  AutoScheduler to new schedule primitives (tensorize, loop partition, software pipelining, etc).
+* The three systems above have isolated sets of APIs with several layers of their own abstraction,
+  which are not only hard to learn, but also engineering-intensive to customize.
+
+**Benefits of Meta Schedule.**  Meta schedule provides:
+* Succinct syntax, consistent APIs to TensorIR schedule with no other layer of abstraction.
+* Unified APIs for implementing manual schedules, AutoTVM-style schedules, and AutoScheduler-style
+  schedules.
+* Extensibility of all the schedule primitives, including tensorization and loop partitioning.
+  Almost no extra effort is needed to use a new primitive in auto-tuning.
+* The automation infrastructure is extensible on every of its components. Every component of the
+  system can be customized easily in pure python or C++ or both. For example, one can develop a new
+  design space generator in python, a new ProgramRunner in python, etc.
+
+
+## 3. Guide-level explanation
+
+In this section, we describe the syntax of meta schedule DSL, and how it could be used to describe
+and auto-generate the design space.
+
+### 3.1. Manual Schedule
+
+Meta schedule APIs are almost the same as TE or TensorIR scheduling. Here is an example of a manual
+schedule for matrix multiplication:
+
+```python
+# Designate a set of tile sizes
+i_tiles = [16, 8, 8, 8]
+j_tiles = [16, 8, 8, 8]
+k_tiles = [256, 8]
+
+# Tile the loops according to the tile sizes
+i_0, i_1, i_2, i_3 = sch.split(loop=i, factors=i_tiles)
+j_0, j_1, j_2, j_3 = sch.split(loop=j, factors=j_tiles)
+k_0, k_1           = sch.split(loop=k, factors=k_tiles)
+
+# Organize the loops into "SSRSRS" 6-level tiles
+sch.reorder(
+    i_0, j_0, # S: the 1st spatial tile
+    i_1, j_1, # S: the 2nd spatial tile
+    k_0,      # R: the 1st reduction tile
+    i_2, j_2, # S: the 3rd spatial tile
+    k_1,      # R: the 2nd reduction tile
+    i_3, j_3, # S: the 4th spatial tile
+)
+```
+
+In this example, the developers may tweak the tile sizes and measure the performance of the
+generated kernels to explore the opportunities of potential optimization.
+
+Generally speaking, while writing a schedule, there are often some parameters that are hard to
+determine ahead of time, for example, tile sizes, unroll steps, or which tensor intrinsics to use.
+Developers may manually enumerate possible combinations of these unknown factors, and then pick the
+best schedule according to measurement results on their device.
+
+### 3.2. Composite Schedule Rules
+
+As introduced in the previous section, in TensorIR, each schedule primitive handles only a very
+basic transformation of the IR. For example, `split` only splits a loop into two new loops. In the
+real world, the over-fine granularity of those primitives usually leads to repetitive and verbose
+scheduling code, as
+[mentioned](https://discuss.tvm.apache.org/t/rfc-tensorir-a-schedulable-ir-for-tvm/7872/43?u=junrushao1994)
+by developers in our community.
+
+To make it more convenient and modular, we allow users to register "composite schedules" that apply
+a sequence of schedule primitives according to certain analysis of the IR. For instance, a composite
+schedule may inspect a TensorIR block and decide whether we should call `compute_inline` on it.
+
+### 3.3. AutoTVM-style Design Space Description
+
+Meta schedule extends the schedule DSL with sampling instructions. When included in a schedule,

Review comment:
       Good questions. I am going to clarify that "sampling instructions" are a new set of schedule primitives which provide randomness




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tvm-rfcs] junrushao1994 commented on a change in pull request #5: [RFC] Meta Schedule (AutoTensorIR)

Posted by GitBox <gi...@apache.org>.
junrushao1994 commented on a change in pull request #5:
URL: https://github.com/apache/tvm-rfcs/pull/5#discussion_r648041157



##########
File path: rfcs/0001-meta-schedule-autotensorir.md
##########
@@ -0,0 +1,297 @@
+* Feature Name: Meta Schedule (AutoTensorIR)

Review comment:
       probably we could rename AutoTensorIR to AutoTIR, but I thought for quite a while and finally decided to add this name - the rationale is that we have been advocating "AutoTIR" as the code name for half a year, so it might help people who are searching for this term. In the meantime, it is ensured that AutoTIR doesn't appear in the body of the document, so that we can have more consistency




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tvm-rfcs] junrushao1994 commented on a change in pull request #5: [RFC] Meta Schedule (AutoTIR)

Posted by GitBox <gi...@apache.org>.
junrushao1994 commented on a change in pull request #5:
URL: https://github.com/apache/tvm-rfcs/pull/5#discussion_r668500696



##########
File path: rfcs/0001-meta-schedule-autotensorir.md
##########
@@ -0,0 +1,297 @@
+* Feature Name: Meta Schedule (AutoTensorIR)
+* Start Date: 2021-05-28
+* RFC PR: TBD (apache/tvm-rfcs#0000)
+* GitHub Issue: TBD (apache/tvm-rfcs#0000)
+
+## 1. Summary
+
+This proposal introduces Meta Schedule: a probabilistic scheduling DSL on TIR that unifies the approaches of AutoTVM and Auto Scheduler (Ansor). Meta schedule provides a pragmatic way to define the space of automatic tuning, extensibility in terms of all possible TIR schedule primitives like tensorization and loop partitioning, and customizability on every layer of the automation system.
+
+Meta Schedule is our 3rd generation automatic scheduling system.
+
+## 2. Motivation
+
+**Scheduling and Design Space**
+
+In TVM TensorIR, optimization of a TensorIR program is done via a sequence of transformations. For example, we reorder loops for better locality and we tensorize for specific hardware intrinsics. The process of invoking such a set of pre-defined transformations is called “**scheduling**”, and each transformation is called a “**schedule primitive**”. These primitives form a domain-specific language (DSL) describing the transformation of TensorIR programs. **Design space** is the set of all possible schedulings with respect to a TensorIR program.
+
+**Problems with the Current Scheduling System**
+
+* **Manual schedule**: Developers optimize their programs by manually invoking schedule primitives, i.e. explore points in the design space with humans in the loop. This can be a tedious and error-prone approach, hence the creation of AutoTVM and AutoScheduler (Ansor).
+* **AutoTVM**: The automation system requires users to define “schedule templates” as the design space for each operator. Therefore, it is inextensible to hundreds of operators.
+* **AutoScheduler (Ansor)**: It automatically generates schedule templates as the design space, according to a set of predefined “search rules”. However, it is non-trivial to extend AutoScheduler to new schedule primitives (tensorize, loop partition, software pipelining).
+* The three systems above have isolated sets of APIs with several layers of their own abstraction, which are not only hard to learn, but also engineering-intensive to customize.
+
+**Benefit of Meta Schedule**
+
+* Succinct syntax, consistent APIs to TensorIR schedule with no other layer of abstraction.
+* Provides unified APIs for implementing manual schedule, AutoTVM and AutoScheduler (Ansor).
+* Extensibility to all the schedule primitives, including tensorization and loop partitioning. Almost no extra effort is needed to use a new primitive in auto-tuning.
+* The automation infrastructure is customizable across every layer.
+
+## 3. Guide-level explanation
+
+In this section, we describe the syntax of meta schedule DSL, and how it could be used to describe and auto-generate the design space.
+
+### 3.1. Manual Schedule
+
+Meta schedule APIs are almost the same as TE or TensorIR scheduling. Here is an example of a manual schedule for matrix multiplication:
+
+```python
+# Designate a set of tile sizes
+i_tiles = [16, 8, 8, 8]
+j_tiles = [16, 8, 8, 8]
+k_tiles = [256, 8]
+
+# Tile the loops according to the tile sizes
+i_0, i_1, i_2, i_3 = sch.split(loop=i, factors=i_tiles)
+j_0, j_1, j_2, j_3 = sch.split(loop=j, factors=j_tiles)
+k_0, k_1           = sch.split(loop=k, factors=k_tiles)
+
+# Organize the loops into “SSRSRS” 6-level tiles
+sch.reorder(
+    i_0, j_0, # S
+    i_1, j_1, # S
+    k_0,      # R
+    i_2, j_2, # S
+    k_1,      # R
+    i_3, j_3, # S
+)
+```
+
+In this example, the developers may tweak the tile sizes and measure the performance of the generated kernels to explore the opportunities of potential optimization.
+
+Generally speaking, while writing a schedule, there are often some parameters that are hard to determine ahead of time, for example, tile sizes, unroll steps, or which tensor intrinsics to use. Developers may manually enumerate possible combinations of these unknown factors, and then pick the best schedule according to measurement results on their device.
+
+### 3.2. AutoTVM-style Design Space Description
+
+Meta schedule extends the schedule DSL with sampling instructions. When included in a schedule, these instructions parametrize the schedule from a single deterministic point to a space supported by random variables (tile size, etc.), making it possible for developers to describe the design space with meta schedule APIs.
+
+We can extend the matmul example above to cover all possible tilings using these sampling instructions:
+
+```python
+# Sample tile sizes
+i_tiles = sch.sample_perfect_tile(i, n=4)
+j_tiles = sch.sample_perfect_tile(j, n=4)
+k_tiles = sch.sample_perfect_tile(k, n=2)
+# Tile the loops according to the random variables
+i_0, i_1, i_2, i_3 = sch.split(loop=i, factors=i_tiles)
+j_0, j_1, j_2, j_3 = sch.split(loop=j, factors=j_tiles)
+k_0, k_1           = sch.split(loop=k, factors=k_tiles)
+# Organize the loops into “SSRSRS” 6-level tiles
+sch.reorder(
+    i_0, j_0, # S
+    i_1, j_1, # S
+    k_0,      # R
+    i_2, j_2, # S
+    k_1,      # R
+    i_3, j_3, # S
+)
+```
+
+### 3.3. Composite Schedule
+
+Each schedule primitive handles only a very basic operation to transform the IR, for example, `split` only splits a loop into two. In the real world, the over-fine granularity of those primitives usually leads to repetitive and verbose scheduling code, as [mentioned](https://discuss.tvm.apache.org/t/rfc-tensorir-a-schedulable-ir-for-tvm/7872/43?u=junrushao1994) by developers in our community.
+
+To counter this challenge, we allow users to register “composite schedules” that analyze the IR, and apply a set of schedule primitives correspondingly. For instance, a composite schedule may inspect a TensorIR block and decide whether we should call `compute_inline` on it. The composite schedule may use sampling instructions to fill in undecided choices.
+
+Our system also ships with some built-in composite schedules, including:
+
+* Multi-level tiling
+* Inline pure spatial blocks
+* Parallelize & vectorize & unroll
+* Auto tensorize
+* …
+
+### 3.4. AutoScheduler-style Design Space Generation
+
+AutoScheduler (Ansor) generates schedule templates by applying their SearchRules to each stage. Meta schedule treats a search rule as a composite schedule, and applies each composite schedule to each block of TensorIR to generate the design space.
+
+### 3.5. Unifying manual schedule / AutoTVM / Ansor
+
+In this section, we show that the design space induced by TE manual schedule, AutoTVM and Ansor are all subsets of meta schedule, and meta schedule further allows mixing those three styles to search jointly.
+
+**Manual schedule**. The TE schedule is a special case of a meta schedule program, where there is no randomness introduced by sampling instructions. It is a single point in terms of design space.
+
+**AutoTVM (Template-based tuning)**. Writing one or more schedule functions in meta schedule, potentially with sampling instructions, is a natural representation of AutoTVM’s schedule templates (knobs). The PPL generates one or more traces as the design space to explore.
+
+**AutoScheduler (Ansor, Template-free tuning)**. As mentioned in the previous section, application of composite schedule rules generates the design space, which is equivalent to Ansor’s sketch generation.
+
+**Mixing styles in design space definition**. By taking union of the spaces induced by the three special cases, our system allows developers to combine generic rules that Ansor provides and operator-specific scheduling.
+
+## 4. Reference-level explanation
+
+In this section, we introduce the underlying techniques for the automation system to extract and explore the design space.
+
+### 4.1. Execution trace as the design space
+
+**Trace**. To represent the design space defined by the meta schedule DSL, the underlying system records all the instructions users applied to the schedule class, including sampling and schedule primitives. We call this list of instructions a trace.
+
+Executing the example above results in the following trace:
+
+```
+Instruction 0. Sample-Perfect-Tile
+Instruction 1. Sample-Perfect-Tile
+Instruction 2. Sample-Perfect-Tile
+Instruction 3. Split
+Instruction 4. Split
+Instruction 5. Split
+Instruction 6. Reorder
+```
+
+The trace is not directly user-facing, but a data structure inside the user-facing `Schedule` class that records the execution. The automation system extracts the trace and finds out the design space according to the sampling instructions.
+
+**Union of traces**. Often a single trace is unable to represent the entire space. Therefore, more precisely, our system works on a list of traces as the union of potential design space.
+
+**Fork a trace**. When two different decisions in the scheduling process are equally important to generate high-performance schedules, we allow forking the trace into two, and the design space is the union of the forked traces.
+
+### 4.2. Exploring the Search Space
+
+Meta Schedule provides several built-in exploration strategies to exhaustively or efficiently search for efficient schedules.
+
+**Program replay**. A simple strategy that replays the schedule program that generates the PPL, and doesn’t use any advantage provided by the PPL.
+
+**Random search**. Extracts the PPL, and repetitively re-executes the PPL by flipping coins purely randomly.
+
+**Cost-model-guided evolutionary search**. A more efficient exploration strategy. We define two sets of rules:
+
+* Mutator: defines how to jump to a point’s “neighbor” in the design space
+* Postprocessor: sometimes it is non-trivial to statically determine the PPL, for example:
+  * There is a hard requirement in CUDA that the maximum number of threads should not exceed 1024, but it is a random variable that cannot be determined before actually executing the PPL. In this case, we write a postprocessor that errors out when the condition is not satisfied.
+  * The number of outer loops to be fused together depends on their extents, which are random variables. In this case, we annotate the maximum extent allowed on the block, and do actual fusion in a postprocessor.
+
+Our evolutionary search algorithm uses mutators to find possible schedules in the design space, then applies postprocessors, and asks the cost model to predict its performance. After several iterations, the new schedules with the highest scores are finally compiled and measured on device. Epsilon-greedy is used in this process to balance exploitation and exploration.
+
+### 4.3. Python first for flexibility & customizability
+
+We implement the system in a way that all levels are decoupled and open to customization, aiming at providing a playground for developers to try out new ideas and potentially deliver performance quickly.
+
+While all the important APIs are implemented in C++ for efficiency, every part of the system can be easily switched to customized python implementation. For example,
+
+**Customize design space in python**. Can be a python function that does the schedule
+
+```python
+def schedule_matmul(sch) -> sch:
+    i, j, k = sch.get_loops(sch.get_block(“matmul”))
+    i_tiles = sch.sample_perfect_tile(i, n=4)
+    j_tiles = sch.sample_perfect_tile(j, n=4)
+    k_tiles = sch.sample_perfect_tile(k, n=2)
+    # Tile the loops according to the random variables
+    i_0, i_1, i_2, i_3 = sch.split(loop=i, factors=i_tiles)
+    j_0, j_1, j_2, j_3 = sch.split(loop=j, factors=j_tiles)
+    k_0, k_1 = sch.split(loop=k, factors=k_tiles)
+    # Organize the loops into “SSRSRS” 6-level tiles
+    sch.reorder(
+        i_0, j_0, # S
+        i_1, j_1, # S
+        k_0,      # R
+        i_2, j_2, # S
+        k_1,      # R
+        i_3, j_3, # S
+    )
+    return sch
+```
+
+**Customize composite schedule in python**. We provide two ways to define a composite schedule in python:
+
+Method 1. A simple decorator that converts a python function to a composite schedule
+
+```python
+@tir.as_composite_schedule(name="multi-level-tiling")
+def multi_level_tiling(sch: Schedule, block: BlockRV) -> Union[Schedule, List[Schedule]]:
+    ...
+```
+
+Method 2. Derive from `PyCompositeSchedule`, providing extra functionalities like initialization
+
+```python
+class MultiLevelTiling(PyCompositeSchedule):
+    def initialize(...):
+        ...
+    def apply(...):
+        ...
+```
+
+**Customize exploration strategies in python**. Developers can implement any search algorithm in python as well by deriving from `PySearchPolicy`.
+
+**Other customizable components**. This list includes:
+
+* Cost model
+* Database

Review comment:
       @areusch Right. It is not very relevant to the core idea but is definitely a component of the system. Will add a short section describing what a database is




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tvm-rfcs] junrushao1994 commented on a change in pull request #5: [RFC] Meta Schedule (AutoTIR)

Posted by GitBox <gi...@apache.org>.
junrushao1994 commented on a change in pull request #5:
URL: https://github.com/apache/tvm-rfcs/pull/5#discussion_r668508988



##########
File path: rfcs/0001-meta-schedule-autotensorir.md
##########
@@ -0,0 +1,444 @@
+<!--- Licensed to the Apache Software Foundation (ASF) under one -->
+<!--- or more contributor license agreements.  See the NOTICE file -->
+<!--- distributed with this work for additional information -->
+<!--- regarding copyright ownership.  The ASF licenses this file -->
+<!--- to you under the Apache License, Version 2.0 (the -->
+<!--- "License"); you may not use this file except in compliance -->
+<!--- with the License.  You may obtain a copy of the License at -->
+
+<!---   http://www.apache.org/licenses/LICENSE-2.0 -->
+
+<!--- Unless required by applicable law or agreed to in writing, -->
+<!--- software distributed under the License is distributed on an -->
+<!--- "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY -->
+<!--- KIND, either express or implied.  See the License for the -->
+<!--- specific language governing permissions and limitations -->
+<!--- under the License. -->
+
+* Feature Name: Meta Schedule (Formerly AutoTIR)
+* Start Date: 2021-05-28
+* RFC PR: TBD (apache/tvm-rfcs#0000)
+* GitHub Issue: TBD (apache/tvm-rfcs#0000)
+
+## 1. Summary
+
+This proposal introduces Meta Schedule: a probabilistic scheduling DSL on TIR that unifies the
+approaches of AutoTVM and Auto Scheduler (Ansor). Meta schedule provides a pragmatic way to define
+the space of automatic tuning, extensibility in terms of all possible TIR schedule primitives like
+tensorization and loop partitioning, and customizability on every layer of the automation system.
+
+Meta Schedule is our 3rd generation automatic scheduling system.
+
+## 2. Motivation
+
+**Scheduling and design space.** In TVM TensorIR, optimization of a TensorIR program is done via a
+sequence of transformations. For example, we reorder loops for better locality and we tensorize for
+specific hardware intrinsics. The process of invoking such a set of pre-defined transformations is
+called "**scheduling**", and each transformation is called a "**schedule primitive**". These
+primitives form a domain-specific language (DSL) describing the transformation of TensorIR programs.
+**Design space** is the set of all possible schedulings with respect to a TensorIR program.
+
+**Problems with the current scheduling system.** Currently we have 3 sets of scheduling APIs:
+* **Manual schedule**: Developers optimize their programs by manually invoking schedule primitives,
+  i.e. explore points in the design space with humans in the loop. This can be a tedious and
+  error-prone approach, hence the creation of AutoTVM and AutoScheduler (Ansor).
+* **AutoTVM**: The automation system requires users to define "schedule templates" as the design
+  space for each operator. Therefore, it is inextensible to hundreds of operators and variety
+  hardware platforms.
+* **AutoScheduler (Ansor)**: It automatically generates schedule templates as the design space,
+  according to a set of predefined "search rules". However, it is non-trivial to extend
+  AutoScheduler to new schedule primitives (tensorize, loop partition, software pipelining, etc).
+* The three systems above have isolated sets of APIs with several layers of their own abstraction,
+  which are not only hard to learn, but also engineering-intensive to customize.
+
+**Benefits of Meta Schedule.**  Meta schedule provides:
+* Succinct syntax, consistent APIs to TensorIR schedule with no other layer of abstraction.
+* Unified APIs for implementing manual schedules, AutoTVM-style schedules, and AutoScheduler-style
+  schedules.
+* Extensibility of all the schedule primitives, including tensorization and loop partitioning.
+  Almost no extra effort is needed to use a new primitive in auto-tuning.
+* The automation infrastructure is extensible on every of its components. Every component of the
+  system can be customized easily in pure python or C++ or both. For example, one can develop a new
+  design space generator in python, a new ProgramRunner in python, etc.
+
+
+## 3. Guide-level explanation
+
+In this section, we describe the syntax of meta schedule DSL, and how it could be used to describe
+and auto-generate the design space.
+
+### 3.1. Manual Schedule
+
+Meta schedule APIs are almost the same as TE or TensorIR scheduling. Here is an example of a manual
+schedule for matrix multiplication:
+
+```python
+# Designate a set of tile sizes
+i_tiles = [16, 8, 8, 8]
+j_tiles = [16, 8, 8, 8]
+k_tiles = [256, 8]
+
+# Tile the loops according to the tile sizes
+i_0, i_1, i_2, i_3 = sch.split(loop=i, factors=i_tiles)
+j_0, j_1, j_2, j_3 = sch.split(loop=j, factors=j_tiles)
+k_0, k_1           = sch.split(loop=k, factors=k_tiles)
+
+# Organize the loops into "SSRSRS" 6-level tiles
+sch.reorder(
+    i_0, j_0, # S: the 1st spatial tile
+    i_1, j_1, # S: the 2nd spatial tile
+    k_0,      # R: the 1st reduction tile
+    i_2, j_2, # S: the 3rd spatial tile
+    k_1,      # R: the 2nd reduction tile
+    i_3, j_3, # S: the 4th spatial tile
+)
+```
+
+In this example, the developers may tweak the tile sizes and measure the performance of the
+generated kernels to explore the opportunities of potential optimization.
+
+Generally speaking, while writing a schedule, there are often some parameters that are hard to
+determine ahead of time, for example, tile sizes, unroll steps, or which tensor intrinsics to use.
+Developers may manually enumerate possible combinations of these unknown factors, and then pick the
+best schedule according to measurement results on their device.
+
+### 3.2. Composite Schedule Rules
+
+As introduced in the previous section, in TensorIR, each schedule primitive handles only a very
+basic transformation of the IR. For example, `split` only splits a loop into two new loops. In the
+real world, the over-fine granularity of those primitives usually leads to repetitive and verbose
+scheduling code, as
+[mentioned](https://discuss.tvm.apache.org/t/rfc-tensorir-a-schedulable-ir-for-tvm/7872/43?u=junrushao1994)
+by developers in our community.
+
+To make it more convenient and modular, we allow users to register "composite schedules" that apply
+a sequence of schedule primitives according to certain analysis of the IR. For instance, a composite
+schedule may inspect a TensorIR block and decide whether we should call `compute_inline` on it.
+
+### 3.3. AutoTVM-style Design Space Description
+
+Meta schedule extends the schedule DSL with sampling instructions. When included in a schedule,
+these instructions parametrize the schedule from a single deterministic point to a space supported
+by random variables (tile size, etc.), making it possible for developers to describe the design
+space with meta schedule APIs.
+
+We can extend the matmul example above to cover all possible tilings using these sampling
+instructions:
+
+```python
+# Sample tile sizes
+i_tiles = sch.sample_perfect_tile(i, n=4)
+j_tiles = sch.sample_perfect_tile(j, n=4)
+k_tiles = sch.sample_perfect_tile(k, n=2)
+# Tile the loops according to the random variables
+i_0, i_1, i_2, i_3 = sch.split(loop=i, factors=i_tiles)
+j_0, j_1, j_2, j_3 = sch.split(loop=j, factors=j_tiles)
+k_0, k_1           = sch.split(loop=k, factors=k_tiles)
+# Organize the loops into "SSRSRS" 6-level tiles
+sch.reorder(
+    i_0, j_0, # S: the 1st spatial tile
+    i_1, j_1, # S: the 2nd spatial tile
+    k_0,      # R: the 1st reduction tile
+    i_2, j_2, # S: the 3rd spatial tile
+    k_1,      # R: the 2nd reduction tile
+    i_3, j_3, # S: the 4th spatial tile
+)
+```
+
+### 3.4. AutoScheduler-style Design Space Generation
+
+AutoScheduler (Ansor) generates schedule templates by applying their SearchRules to each stage.
+SearchRule analyzes TE and eagerly trigger schedule primitives accordingly in its internally
+maintained mini IR.
+
+As introduced in Section 3.2, composite schedule rules are equivalent to AutoScheduler's SearchRule
+in TensorIR scheduling. To further generate a design space for scheduling, sampling instructions are
+used in composite schedule rules. Similarly, the sketch generation phase in AutoScheduler is
+equivalent to applying composite schedule rules to each block in TensorIR.
+
+Several built-in composite schedule rules are shipped with our system to align with Ansor's design
+space:
+
+* Multi-level tiling
+* Inline pure spatial blocks
+* Parallelize & vectorize & unroll
+* Auto tensorize
+
+Developers may implement their own rules in either Python or C++. They may specify which rules to
+use with the following syntax:
+
+```python
+from tvm import meta_schedule as ms
+
+design_space_generator = ms.PostOrderApply(rules=[
+    ms.MultiLevelTiling(...),
+    CustomRule(...),
+    ms.OtherBuiltinRules(...),
+])
+
+```
+
+### 3.5. Unifying manual schedule / AutoTVM / Ansor
+
+In this section, we show that the design space induced by TE manual schedule, AutoTVM and Ansor are
+all subsets of meta schedule, and meta schedule further allows mixing those three styles to search
+jointly.
+
+**Manual schedule**. The TE schedule is a special case of a meta schedule program, where there is no
+randomness introduced by sampling instructions. It is a single point in terms of design space.
+
+**AutoTVM (Template-based tuning)**. It is more natural representation of AutoTVM’s schedule
+templates (knobs) by writing one or more schedule functions in meta schedule with sampling
+instructions. The execution traces generated by the schedule functions are the design space to be
+explored.
+
+**AutoScheduler (Ansor, Template-free tuning)**. As mentioned in the previous section, application
+of composite schedule rules generates the design space, which is equivalent to Ansor’s sketch
+generation.
+
+**Mixing styles in design space definition**. By taking union of the spaces induced by the three
+special cases, our system allows developers to combine generic rules that Ansor provides and
+operator-specific scheduling.
+
+## 4. Reference-level explanation
+
+In this section, we introduce the underlying techniques for the automation system to extract and
+explore the design space. The figure below briefly illustrates the workflow of the system:
+
+![meta-schedule-workflow](../resources/meta-schedule-workflow.png)
+
+### 4.1. Execution trace as the design space
+
+**Trace**. To represent the design space defined by the meta schedule DSL, the underlying system
+records all the instructions users applied to the schedule class, including sampling and schedule
+primitives. We call this list of instructions a trace.
+
+For instance, executing the example above results in the following trace:
+
+```
+Instruction 0. Sample-Perfect-Tile
+Instruction 1. Sample-Perfect-Tile
+Instruction 2. Sample-Perfect-Tile
+Instruction 3. Split
+Instruction 4. Split
+Instruction 5. Split
+Instruction 6. Reorder
+```
+
+**Trace forms design space.** A trace may contain zero or more sampling instructions, which
+introduces the uncertainty in scheduling - one instance of sampling results in one point in the
+design space. Therefore, the trace itself forms a design space to explore, e.g. which set of tile
+sizes works best on a specific hardware.
+
+**Union of design space**. Our system works on a set of traces, representing the union of the design
+spaces represented by every single trace.
+
+**Fork a trace**. When two different decisions in the scheduling process are equally important to
+generate high-performance schedules, we allow forking the trace into two, and the design space is
+the union of the forked traces.
+
+The trace is not strictly user-facing, but can be accessed and printed with the following syntax:
+
+```python
+# requires to trace the execution
+sch = tir.Schedule(..., traced=True)
+# do a lot of scheduling
+...
+# print the trace
+print(sch.trace)
+```
+
+And below is an example of the printed trace, which honestly reflects the schedule as a snippet of
+python scheduling function:
+
+```python
+b0 = sch.get_block(name="matmul", func_name="main")
+l1, l2, l3 = sch.get_loops(block=b0)
+v4, v5, v6, v7 = sch.sample_perfect_tile(loop=l1, n=4, max_innermost_factor=16, decision=[32, 1, 16, 2])
+v8, v9, v10, v11 = sch.sample_perfect_tile(loop=l2, n=4, max_innermost_factor=16, decision=[64, 4, 2, 2])
+v12, v13 = sch.sample_perfect_tile(loop=l3, n=2, max_innermost_factor=16, decision=[64, 16])
+l14, l15, l16, l17 = sch.split(loop=l1, factors=[v4, v5, v6, v7])
+l18, l19, l20, l21 = sch.split(loop=l2, factors=[v8, v9, v10, v11])
+l22, l23 = sch.split(loop=l3, factors=[v12, v13])
+sch.reorder(l14, l18, l15, l19, l22, l16, l20, l23, l17, l21)
+```
+
+### 4.2. Exploring the Design Space
+
+Meta Schedule provides several built-in exploration strategies to exhaustively or efficiently search
+for efficient schedules.
+
+**Random search by replaying schedule functions.** With a user-provided schedule function
+as a black-box design space generator, our system could repetitively invoke such an opaque function
+without doing any extra analysis. The function could be written in C++ or Python, or any language
+that implements packed function FFI. If sampling instructions are present in the function, then each
+invocation results in a different IRModule after being scheduled because the random decisions are
+possibly changed across different runs. Effectively, it is equivalent to
+random exploration without trace, allowing the flexibility for users to define arbitrary functions
+that trace may not well support (e.g. control flow divergence based on the value of intermediate
+random variables), but it forbids future opportunity of any trace-based analysis.
+
+**Random search by replaying traces.** Traces are obtained from a design space generator, and
+replayed with a builtin interpreter in our system. If sampling instructions are present on the
+traces, then their random decisions are mutated during each replay, i.e. jumps to a new point in the
+design space. Therefore, repetitive replay of those traces are equivalent to exploration of the
+design space. Our system could potentially benefit from trace-based analysis, including rejecting
+obviously invalid schedules (e.g. using too much CUDA resources), doing dead-code elimination to
+simplify a trace, extracting trace-based features used in the cost model, etc.
+
+**Cost-model-guided evolutionary search**. A more efficient exploration strategy. We define two sets
+of rules:
+
+* Mutator: defines how to jump to a point’s "neighbor" in the design space
+* Postprocessor: after the trace is executed, there are some extra rules we want to execute, for
+  example:
+  * Check CUDA resource limits: There is a hard requirement in CUDA that the maximum number of
+    threads should not exceed 1024, but it is a random variable that cannot be determined before
+    actually executing the trace. In this case, we write a postprocessor that errors out when the
+    condition is not satisfied.
+  * Fuse outer loops until the extent of the fused loops is large enough: The number of outer loops
+    to be fused together depends on their extents, which are random variables. In this case, we
+    annotate the maximum extent allowed on the block, and do actual fusion in a postprocessor.
+
+Our evolutionary search algorithm uses mutators to find possible schedules in the design space, then
+applies postprocessors, and asks the cost model to predict its performance. After several
+iterations, the new schedules with the highest scores are finally compiled and measured on device.
+Epsilon-greedy is used in this process to balance exploitation and exploration.
+
+### 4.3. Python first for flexibility & customizability
+
+We implement the system in a way that all levels are decoupled and open to customization, aiming at
+providing a playground for developers to try out new ideas and potentially deliver performance
+quickly.
+
+While all the important APIs are implemented in C++ for efficiency, every part of the system can be
+easily switched to customized python implementation. For example,
+
+**Customize design space in python**. Can be a python function that does the schedule
+
+```python
+def schedule_matmul(sch) -> sch:
+    i, j, k = sch.get_loops(sch.get_block("matmul"))
+    i_tiles = sch.sample_perfect_tile(i, n=4)
+    j_tiles = sch.sample_perfect_tile(j, n=4)
+    k_tiles = sch.sample_perfect_tile(k, n=2)
+    # Tile the loops according to the random variables
+    i_0, i_1, i_2, i_3 = sch.split(loop=i, factors=i_tiles)
+    j_0, j_1, j_2, j_3 = sch.split(loop=j, factors=j_tiles)
+    k_0, k_1 = sch.split(loop=k, factors=k_tiles)
+    # Organize the loops into "SSRSRS" 6-level tiles
+    sch.reorder(
+        i_0, j_0, # S
+        i_1, j_1, # S
+        k_0,      # R
+        i_2, j_2, # S
+        k_1,      # R
+        i_3, j_3, # S
+    )
+    return sch
+```
+
+**Customize composite schedule in python**. We provide two ways to define a composite schedule in
+python:
+
+Method 1. Derive from `PyCompositeSchedule`, and implement two methods `initialize` and `apply`:
+
+```python
+class MultiLevelTiling(PyCompositeSchedule):
+    def initialize(...):
+        # initialize the class, usually this method is empty
+        ...
+
+    def apply(sch: Schedule, block: BlockRV) -> Union[Schedule, List[Schedule]]:
+        # write any python code, including:
+        # - analyze `block`
+        # - invoke schedule primitives in `sch`
+        # - do debug printing
+        ...
+```
+
+Method 2. A decorator as the syntactic sugar if the `initialize` method is empty, which converts the
+function to the `apply` method.
+
+```python
+@tir.as_composite_schedule(name="multi-level-tiling")
+def multi_level_tiling(sch: Schedule, block: BlockRV) -> Union[Schedule, List[Schedule]]:
+    # write any python code, including:
+    # - analyze `block`
+    # - invoke schedule primitives in `sch`
+    # - do debug printing
+    ...
+```
+
+**Customize exploration strategies in python**. Developers can implement any search algorithm in
+python as well by deriving from `PySearchPolicy`, and the syntax is identical to customizing with
+`PyCompositeSchedule`.
+
+**Other customizable components**. This list includes:
+
+* Cost model
+* Database
+* Measure callbacks
+* Feature extractor
+* Program builder & runner
+* Analysis methods
+* ...
+
+In a short summary, almost every component of the system is decoupled with each other and extensions
+could be easily plugged in.
+
+## 5. Drawbacks
+
+We are not aware of any drawbacks of the proposed system.
+
+## 6. Rationale and alternatives
+
+The system is designed with the principle of minimalism: different from alternative solutions, we do
+not require any change in existing codebase, or extra APIs to learn. It could potentially lower the
+bar of using automation systems. 
+
+Unifying manual scheduling, AutoTVM's semi automatic templates and AutoScheduler's (Ansor's) fully
+automatic sketch generation provides flexible way to balance injection new domain knowledge and
+automation.
+
+Flexibility in customization allows quick try-out on new tasks, new strategies and new hardware
+targets without deep knowledge of the system.
+
+## 7. Prior art
+
+**Tensor Expression (TE)** in TVM is a DSL that decouples compute and schedule, which provides
+convenient ways to handcraft optimized kernels for different hardware targets.
+
+**TensorIR** is the latest generation of TVM’s low-level IR. Its capability of eagerly applying
+schedule primitives opens the door for meta schedule, our proposed new-generation auto scheduling
+system.
+
+**AutoTVM** is the 1st generation automation framework in TVM, which requires developers to
+implement per-operator scheduling templates, and the system could handle the tuning process.
+
+**AutoScheduler (Ansor)** is the 2nd generation automation framework in TVM, whose built-in rules
+could automatically generate schedule templates for almost all the operators on CPU, GPU, etc.
+
+## 8. Unresolved questions
+
+**Supporting Control Flow and Assertions**
+
+Right now the meta schedule DSL does not support control flow. Although we didn’t see any real-world
+use case right now, it is possible that it could appear in some future workloads.
+
+A real-world issue we could see is that sampling may lead to wrong schedules on CUDA, e.g. the
+schedule results in a CUDA program that uses too much shared memory, too many threads, etc. In this
+case, we need to halt the program immediately. Therefore, introducing assertion may be helpful.
+
+## 9. Future possibilities
+
+**Unifying Manual Scheduling, AutoTVM and Ansor in TOPI**
+
+Meta schedule provides an idiomatic approach to unify the three existing scheduling APIs in TVM:
+
+* Manual schedules are meta schedules without sampling instructions
+* AutoTVM templates are meta schedules where knobs are replaced by sampling instructions
+* Each of Ansor’s search rules generates a snippet of a meta schedule
+
+We further allow mixing different styles of scheduling and exploring the union space, which could

Review comment:
       Added an explanation:
   
   ```markdown
   At the time of writing, TOPI contains a number of schedule functions implemented either in manual TE
   or AutoTVM-style. It is our future work to unify these existing scheduling APIs on TOPI operators,
   and enable different styles to be auto-tuned jointly.
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tvm-rfcs] junrushao1994 commented on pull request #5: [RFC] Meta Schedule (AutoTIR)

Posted by GitBox <gi...@apache.org>.
junrushao1994 commented on pull request #5:
URL: https://github.com/apache/tvm-rfcs/pull/5#issuecomment-871760591


   Updated according to the comments. Would you guys like to re-review? @tqchen 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tvm-rfcs] junrushao1994 commented on a change in pull request #5: [RFC] Meta Schedule (AutoTIR)

Posted by GitBox <gi...@apache.org>.
junrushao1994 commented on a change in pull request #5:
URL: https://github.com/apache/tvm-rfcs/pull/5#discussion_r676880624



##########
File path: rfcs/0001-meta-schedule-autotensorir.md
##########
@@ -266,55 +322,83 @@ sch.reorder(l14, l18, l15, l19, l22, l16, l20, l23, l17, l21)
 ### 4.2. Exploring the Design Space
 
 Meta Schedule provides several built-in exploration strategies to exhaustively or efficiently search
-for efficient schedules.
-
-**Random search by replaying schedule functions.** With a user-provided schedule function
-as a black-box design space generator, our system could repetitively invoke such an opaque function
-without doing any extra analysis. The function could be written in C++ or Python, or any language
-that implements packed function FFI. If sampling instructions are present in the function, then each
-invocation results in a different IRModule after being scheduled because the random decisions are
-possibly changed across different runs. Effectively, it is equivalent to
-random exploration without trace, allowing the flexibility for users to define arbitrary functions
+for efficient schedules. Those strategies are mostly supported by re-execute either a function or a
+trace with a builtin interpreter in meta schedule, and this process is called **replay**.
+
+#### Random search by replaying schedule functions
+
+With a user-provided schedule function
+as a black-box design space generator, meta schedule could repetitively invokes such an opaque TVM
+packed function without doing any extra analysis.
+If sampling instructions are present in the trace, then scheduling is non-deterministic
+(random decisions may not be repeated across runs)

Review comment:
       Will add these to Section 4.1. Trace




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tvm-rfcs] tqchen commented on pull request #5: [RFC] Meta Schedule (AutoTIR)

Posted by GitBox <gi...@apache.org>.
tqchen commented on pull request #5:
URL: https://github.com/apache/tvm-rfcs/pull/5#issuecomment-872245721


   @FrozenGene @comaniac @tkonolige please take another look


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tvm-rfcs] junrushao1994 commented on a change in pull request #5: [RFC] Meta Schedule (AutoTensorIR)

Posted by GitBox <gi...@apache.org>.
junrushao1994 commented on a change in pull request #5:
URL: https://github.com/apache/tvm-rfcs/pull/5#discussion_r648037427



##########
File path: rfcs/0001-meta-schedule-autotensorir.md
##########
@@ -0,0 +1,297 @@
+* Feature Name: Meta Schedule (AutoTensorIR)
+* Start Date: 2021-05-28
+* RFC PR: TBD (apache/tvm-rfcs#0000)
+* GitHub Issue: TBD (apache/tvm-rfcs#0000)
+
+## 1. Summary
+
+This proposal introduces Meta Schedule: a probabilistic scheduling DSL on TIR that unifies the approaches of AutoTVM and Auto Scheduler (Ansor). Meta schedule provides a pragmatic way to define the space of automatic tuning, extensibility in terms of all possible TIR schedule primitives like tensorization and loop partitioning, and customizability on every layer of the automation system.
+
+Meta Schedule is our 3rd generation automatic scheduling system.
+
+## 2. Motivation
+
+**Scheduling and Design Space**
+
+In TVM TensorIR, optimization of a TensorIR program is done via a sequence of transformations. For example, we reorder loops for better locality and we tensorize for specific hardware intrinsics. The process of invoking such a set of pre-defined transformations is called “**scheduling**”, and each transformation is called a “**schedule primitive**”. These primitives form a domain-specific language (DSL) describing the transformation of TensorIR programs. **Design space** is the set of all possible schedulings with respect to a TensorIR program.
+
+**Problems with the Current Scheduling System**
+
+* **Manual schedule**: Developers optimize their programs by manually invoking schedule primitives, i.e. explore points in the design space with humans in the loop. This can be a tedious and error-prone approach, hence the creation of AutoTVM and AutoScheduler (Ansor).
+* **AutoTVM**: The automation system requires users to define “schedule templates” as the design space for each operator. Therefore, it is inextensible to hundreds of operators.
+* **AutoScheduler (Ansor)**: It automatically generates schedule templates as the design space, according to a set of predefined “search rules”. However, it is non-trivial to extend AutoScheduler to new schedule primitives (tensorize, loop partition, software pipelining).
+* The three systems above have isolated sets of APIs with several layers of their own abstraction, which are not only hard to learn, but also engineering-intensive to customize.
+
+**Benefit of Meta Schedule**
+
+* Succinct syntax, consistent APIs to TensorIR schedule with no other layer of abstraction.
+* Provides unified APIs for implementing manual schedule, AutoTVM and AutoScheduler (Ansor).
+* Extensibility to all the schedule primitives, including tensorization and loop partitioning. Almost no extra effort is needed to use a new primitive in auto-tuning.
+* The automation infrastructure is customizable across every layer.
+
+## 3. Guide-level explanation
+
+In this section, we describe the syntax of meta schedule DSL, and how it could be used to describe and auto-generate the design space.
+
+### 3.1. Manual Schedule
+
+Meta schedule APIs are almost the same as TE or TensorIR scheduling. Here is an example of a manual schedule for matrix multiplication:
+
+```python
+# Designate a set of tile sizes
+i_tiles = [16, 8, 8, 8]
+j_tiles = [16, 8, 8, 8]
+k_tiles = [256, 8]
+
+# Tile the loops according to the tile sizes
+i_0, i_1, i_2, i_3 = sch.split(loop=i, factors=i_tiles)
+j_0, j_1, j_2, j_3 = sch.split(loop=j, factors=j_tiles)
+k_0, k_1           = sch.split(loop=k, factors=k_tiles)
+
+# Organize the loops into “SSRSRS” 6-level tiles
+sch.reorder(
+    i_0, j_0, # S
+    i_1, j_1, # S
+    k_0,      # R
+    i_2, j_2, # S
+    k_1,      # R
+    i_3, j_3, # S
+)
+```
+
+In this example, the developers may tweak the tile sizes and measure the performance of the generated kernels to explore the opportunities of potential optimization.
+
+Generally speaking, while writing a schedule, there are often some parameters that are hard to determine ahead of time, for example, tile sizes, unroll steps, or which tensor intrinsics to use. Developers may manually enumerate possible combinations of these unknown factors, and then pick the best schedule according to measurement results on their device.
+
+### 3.2. AutoTVM-style Design Space Description
+
+Meta schedule extends the schedule DSL with sampling instructions. When included in a schedule, these instructions parametrize the schedule from a single deterministic point to a space supported by random variables (tile size, etc.), making it possible for developers to describe the design space with meta schedule APIs.
+
+We can extend the matmul example above to cover all possible tilings using these sampling instructions:
+
+```python
+# Sample tile sizes
+i_tiles = sch.sample_perfect_tile(i, n=4)
+j_tiles = sch.sample_perfect_tile(j, n=4)
+k_tiles = sch.sample_perfect_tile(k, n=2)
+# Tile the loops according to the random variables
+i_0, i_1, i_2, i_3 = sch.split(loop=i, factors=i_tiles)
+j_0, j_1, j_2, j_3 = sch.split(loop=j, factors=j_tiles)
+k_0, k_1           = sch.split(loop=k, factors=k_tiles)
+# Organize the loops into “SSRSRS” 6-level tiles
+sch.reorder(
+    i_0, j_0, # S
+    i_1, j_1, # S
+    k_0,      # R
+    i_2, j_2, # S
+    k_1,      # R
+    i_3, j_3, # S
+)
+```
+
+### 3.3. Composite Schedule
+
+Each schedule primitive handles only a very basic operation to transform the IR, for example, `split` only splits a loop into two. In the real world, the over-fine granularity of those primitives usually leads to repetitive and verbose scheduling code, as [mentioned](https://discuss.tvm.apache.org/t/rfc-tensorir-a-schedulable-ir-for-tvm/7872/43?u=junrushao1994) by developers in our community.
+
+To counter this challenge, we allow users to register “composite schedules” that analyze the IR, and apply a set of schedule primitives correspondingly. For instance, a composite schedule may inspect a TensorIR block and decide whether we should call `compute_inline` on it. The composite schedule may use sampling instructions to fill in undecided choices.
+
+Our system also ships with some built-in composite schedules, including:
+
+* Multi-level tiling
+* Inline pure spatial blocks
+* Parallelize & vectorize & unroll
+* Auto tensorize
+* …
+
+### 3.4. AutoScheduler-style Design Space Generation
+
+AutoScheduler (Ansor) generates schedule templates by applying their SearchRules to each stage. Meta schedule treats a search rule as a composite schedule, and applies each composite schedule to each block of TensorIR to generate the design space.
+
+### 3.5. Unifying manual schedule / AutoTVM / Ansor
+
+In this section, we show that the design space induced by TE manual schedule, AutoTVM and Ansor are all subsets of meta schedule, and meta schedule further allows mixing those three styles to search jointly.
+
+**Manual schedule**. The TE schedule is a special case of a meta schedule program, where there is no randomness introduced by sampling instructions. It is a single point in terms of design space.
+
+**AutoTVM (Template-based tuning)**. Writing one or more schedule functions in meta schedule, potentially with sampling instructions, is a natural representation of AutoTVM’s schedule templates (knobs). The PPL generates one or more traces as the design space to explore.
+
+**AutoScheduler (Ansor, Template-free tuning)**. As mentioned in the previous section, application of composite schedule rules generates the design space, which is equivalent to Ansor’s sketch generation.
+
+**Mixing styles in design space definition**. By taking union of the spaces induced by the three special cases, our system allows developers to combine generic rules that Ansor provides and operator-specific scheduling.
+
+## 4. Reference-level explanation

Review comment:
       Yeah sure I could add an architectural review image explaining the workflow.
   
   However, I feel like describing all the main classes in detail doesn't really help in this particular case. First of all, most of those classes have been described in this draft (Trace, search space generation, search space exploration, cost model, composite schedule, etc); Then, we have a plan in Section 4.4 describing the engineering plan describing similar things (where important classes have been explained in previous sections); Last, I am a bit worried if some design may go out-of-date and this doc could become misleading...
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tvm-rfcs] junrushao1994 commented on a change in pull request #5: [RFC] Meta Schedule (AutoTIR)

Posted by GitBox <gi...@apache.org>.
junrushao1994 commented on a change in pull request #5:
URL: https://github.com/apache/tvm-rfcs/pull/5#discussion_r661750359



##########
File path: rfcs/0001-meta-schedule-autotensorir.md
##########
@@ -0,0 +1,297 @@
+* Feature Name: Meta Schedule (AutoTensorIR)
+* Start Date: 2021-05-28
+* RFC PR: TBD (apache/tvm-rfcs#0000)
+* GitHub Issue: TBD (apache/tvm-rfcs#0000)
+
+## 1. Summary
+
+This proposal introduces Meta Schedule: a probabilistic scheduling DSL on TIR that unifies the approaches of AutoTVM and Auto Scheduler (Ansor). Meta schedule provides a pragmatic way to define the space of automatic tuning, extensibility in terms of all possible TIR schedule primitives like tensorization and loop partitioning, and customizability on every layer of the automation system.
+
+Meta Schedule is our 3rd generation automatic scheduling system.
+
+## 2. Motivation
+
+**Scheduling and Design Space**
+
+In TVM TensorIR, optimization of a TensorIR program is done via a sequence of transformations. For example, we reorder loops for better locality and we tensorize for specific hardware intrinsics. The process of invoking such a set of pre-defined transformations is called “**scheduling**”, and each transformation is called a “**schedule primitive**”. These primitives form a domain-specific language (DSL) describing the transformation of TensorIR programs. **Design space** is the set of all possible schedulings with respect to a TensorIR program.
+
+**Problems with the Current Scheduling System**
+
+* **Manual schedule**: Developers optimize their programs by manually invoking schedule primitives, i.e. explore points in the design space with humans in the loop. This can be a tedious and error-prone approach, hence the creation of AutoTVM and AutoScheduler (Ansor).
+* **AutoTVM**: The automation system requires users to define “schedule templates” as the design space for each operator. Therefore, it is inextensible to hundreds of operators.
+* **AutoScheduler (Ansor)**: It automatically generates schedule templates as the design space, according to a set of predefined “search rules”. However, it is non-trivial to extend AutoScheduler to new schedule primitives (tensorize, loop partition, software pipelining).
+* The three systems above have isolated sets of APIs with several layers of their own abstraction, which are not only hard to learn, but also engineering-intensive to customize.
+
+**Benefit of Meta Schedule**
+
+* Succinct syntax, consistent APIs to TensorIR schedule with no other layer of abstraction.
+* Provides unified APIs for implementing manual schedule, AutoTVM and AutoScheduler (Ansor).
+* Extensibility to all the schedule primitives, including tensorization and loop partitioning. Almost no extra effort is needed to use a new primitive in auto-tuning.
+* The automation infrastructure is customizable across every layer.

Review comment:
       Every piece of the system can be customized easily in pure python or C++ or both. For example, one can develop a new rule in python, a new ProgramRunner in python, a new design space generator in python, etc. I am going to add this paragraph in the text to help clarify




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tvm-rfcs] areusch commented on a change in pull request #5: [RFC] Meta Schedule (AutoTIR)

Posted by GitBox <gi...@apache.org>.
areusch commented on a change in pull request #5:
URL: https://github.com/apache/tvm-rfcs/pull/5#discussion_r677029066



##########
File path: rfcs/0001-meta-schedule-autotensorir.md
##########
@@ -0,0 +1,604 @@
+<!--- Licensed to the Apache Software Foundation (ASF) under one -->
+<!--- or more contributor license agreements.  See the NOTICE file -->
+<!--- distributed with this work for additional information -->
+<!--- regarding copyright ownership.  The ASF licenses this file -->
+<!--- to you under the Apache License, Version 2.0 (the -->
+<!--- "License"); you may not use this file except in compliance -->
+<!--- with the License.  You may obtain a copy of the License at -->
+
+<!---   http://www.apache.org/licenses/LICENSE-2.0 -->
+
+<!--- Unless required by applicable law or agreed to in writing, -->
+<!--- software distributed under the License is distributed on an -->
+<!--- "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY -->
+<!--- KIND, either express or implied.  See the License for the -->
+<!--- specific language governing permissions and limitations -->
+<!--- under the License. -->
+
+* Feature Name: Meta Schedule (Formerly AutoTIR)
+* Start Date: 2021-05-28
+* RFC PR: https://github.com/apache/tvm-rfcs/pull/5/
+* GitHub Issue: https://github.com/apache/tvm/issues/8473
+
+## 1. Summary
+
+This proposal introduces Meta Schedule: a scheduling DSL on TIR that unifies the
+approaches of AutoTVM [1] and AutoScheduler [2]. Meta schedule provides a pragmatic way to define
+the space of automatic tuning, extensibility in terms of all possible TIR schedule primitives like
+tensorization and loop partitioning, and customizability on every layer of the automation system.
+
+Meta Schedule is the 3rd generation automatic scheduling system.
+
+## 2. Motivation
+
+**Scheduling.** In TVM,
+both TensorIR and TE allow direct or indirect IR transformation guided by a

Review comment:
       can we specify more about the IR transformation here? e.g. we need to answer the question: transform from what to what? otherwise you're confused unless you know TVM already.
   
   I think it's confusing to explain because the transformation can either be from TE -> TensorIR or TensorIR -> TensorIR.
   
   here's a stab: "TVM initially describes all model operators using an abstract description of the computation. Such abstractions can be described either in the Tensor Expression IR (the standard prior to Meta Scheduling) or in TensorIR (as a naïve computation). Through a process known as **scheduling**, TVM allows transformation of these IR to an imperative, optimized description of the implmented computation. Such transformation is guided by a developer-provided program..."

##########
File path: rfcs/0001-meta-schedule-autotensorir.md
##########
@@ -0,0 +1,604 @@
+<!--- Licensed to the Apache Software Foundation (ASF) under one -->
+<!--- or more contributor license agreements.  See the NOTICE file -->
+<!--- distributed with this work for additional information -->
+<!--- regarding copyright ownership.  The ASF licenses this file -->
+<!--- to you under the Apache License, Version 2.0 (the -->
+<!--- "License"); you may not use this file except in compliance -->
+<!--- with the License.  You may obtain a copy of the License at -->
+
+<!---   http://www.apache.org/licenses/LICENSE-2.0 -->
+
+<!--- Unless required by applicable law or agreed to in writing, -->
+<!--- software distributed under the License is distributed on an -->
+<!--- "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY -->
+<!--- KIND, either express or implied.  See the License for the -->
+<!--- specific language governing permissions and limitations -->
+<!--- under the License. -->
+
+* Feature Name: Meta Schedule (Formerly AutoTIR)
+* Start Date: 2021-05-28
+* RFC PR: https://github.com/apache/tvm-rfcs/pull/5/
+* GitHub Issue: https://github.com/apache/tvm/issues/8473
+
+## 1. Summary
+
+This proposal introduces Meta Schedule: a scheduling DSL on TIR that unifies the
+approaches of AutoTVM [1] and AutoScheduler [2]. Meta schedule provides a pragmatic way to define
+the space of automatic tuning, extensibility in terms of all possible TIR schedule primitives like
+tensorization and loop partitioning, and customizability on every layer of the automation system.
+
+Meta Schedule is the 3rd generation automatic scheduling system.
+
+## 2. Motivation
+
+**Scheduling.** In TVM,
+both TensorIR and TE allow direct or indirect IR transformation guided by a
+developer-provided program, for example,
+specifying a particular reordering of loops for better locality,
+or tensorizing a compute region with specific hardware intrinsics.
+The process of invoking such a set of pre-defined transformations is called "**scheduling**",
+and each of such transformations is called a "**schedule primitive**".
+These primitives form a domain-specific language (DSL).
+
+**Design space.** The set of all possible schedulings of a TE/TensorIR is called its design space.
+Optimization in TVM is essentially exploring such space to find out a scheduling that transforms the
+IR to generate the kernel with optimal performance.
+
+### Problems with the current scheduling system
+
+Currently there are 3 sets of scheduling APIs in TVM:
+* **Manual schedule**: Developers optimize their programs by manually invoking schedule primitives,
+  i.e. explore points in the design space with humans in the loop. This can be a tedious and
+  error-prone approach, hence the creation of AutoTVM and AutoScheduler.
+* **AutoTVM**: The automation system requires users to define the design space through
+  per-operator "schedule templates." Therefore, programmer time is a bottleneck in scaling
+  to hundreds of operators across many hardware platforms.
+  hardware platforms.
+* **AutoScheduler**: It automatically generates schedule templates as the design space,
+  according to a set of predefined "search rules". However, it is non-trivial to extend
+  AutoScheduler to new schedule primitives (tensorize, loop partition, software pipelining, etc).
+* The three systems above have isolated sets of APIs with several layers of their own abstraction,
+  which are not only hard to learn, but also engineering-intensive to customize.
+
+### Benefits of Meta Schedule
+
+The existing three scheduling systems are mutually incompatible with each other in terms of API
+design and divergence: besides manual TE scheduling, AutoTVM requires users to learn a new set of
+APIs, and AutoScheduler brings in another set of C++-based search rules. It adds the users' mental
+overhead to understand and extend the existing systems. Further, the inability to switch between
+template-based and template-free auto-tuning could lead to inferior customizability and hence
+make it needlessly difficult to achieve optimal performance.
+
+Meta schedule provides:
+* Succinct syntax, consistent APIs to TensorIR schedule with no other layer of abstraction.
+* Unified APIs for implementing manual schedules, AutoTVM-style schedules, and AutoScheduler-style
+  schedules.
+* Extensibility of all the schedule primitives, including tensorization and loop partitioning.
+  Almost no extra effort is needed to use a new primitive in auto-tuning.
+* The automation infrastructure is extensible on every of its components. Every component of the
+  system can be customized easily in pure python or C++ or both. For example, one can develop a new
+  design space generator in python, a new ProgramRunner in python, etc.
+
+
+## 3. Guide-level explanation
+
+Meta Schedule DSL is a language that provides TVM backend developers
+a flexible way to define or auto-generate the operator design space.
+
+This section introduces the syntax of Meta Schedule DSL by describing the 5 common usage patterns
+envisioned by this RFC. These patterns are:
+1) Manually constructing a schedule using existing schedule primitives (Section 3.1);
+2) Defining composite schedule to simplify the ap sequence of schedule primitives (Section 3.2);
+3) Describing a design space of possible schedules,
+a.k.a. AutoTVM-style schedule templates (Section 3.3);
+4) Automatically generating the design space, a.k.a. AutoScheduler-style search rules (Section 3.4);
+5) Mixing the usage of manual schedule, AutoTVM and AutoScheduler-style design space specification
+in Meta Schedule (Section 3.5).
+
+### 3.1. Manual Schedule
+
+Meta schedule APIs are almost the same as TE or TensorIR scheduling. Here is an example of a manual
+schedule for matrix multiplication:
+
+```python
+# Designate a set of tile sizes
+i_tiles = [16, 8, 8, 8]
+j_tiles = [16, 8, 8, 8]
+k_tiles = [256, 8]
+
+# Tile the loops according to the tile sizes
+i_0, i_1, i_2, i_3 = sch.split(loop=i, factors=i_tiles)
+j_0, j_1, j_2, j_3 = sch.split(loop=j, factors=j_tiles)
+k_0, k_1           = sch.split(loop=k, factors=k_tiles)
+
+# Organize the loops into "SSRSRS" 6-level tiles
+sch.reorder(
+    i_0, j_0, # S: the 1st spatial tile
+    i_1, j_1, # S: the 2nd spatial tile
+    k_0,      # R: the 1st reduction tile
+    i_2, j_2, # S: the 3rd spatial tile
+    k_1,      # R: the 2nd reduction tile
+    i_3, j_3, # S: the 4th spatial tile
+)
+```
+
+In this manual scheduling example, the developers tweak the tile sizes and measure the performance of the
+generated kernels to explore the opportunities of potential optimization.
+
+Generally speaking, while writing a schedule, there are often some parameters that are hard to
+determine ahead of time, for example, tile sizes, unroll steps, or which tensor intrinsics to use.
+Developers may manually enumerate possible combinations of these unknown factors, and then pick the
+best schedule according to measurement results on their device.
+
+### 3.2. Composite Schedule Rules
+
+As introduced in the previous section, in TensorIR, each schedule primitive handles only a very
+basic transformation of the IR. For example, `split` only splits a loop into two new loops. In the
+real world, the over-fine granularity of those primitives usually leads to repetitive and verbose
+scheduling code. Take the code snippet in the previous section as an example: a sequence of `split`s
+are invoked, followed by a `reorder`. Taken together these 4 primitives are colloquially known as
+"SSRSRS" tiling.
+
+To make it more convenient and modular, users are allowed to register **composite schedules** that apply
+a sequence of schedule primitives according to certain analysis of the IR.
+The word **composite** here means the schedule transformation is *composed* of those **primitives**.
+
+For example, suppose there is a composite schedule called `Inline-Elementwise-Operation`, which
+inlines elementwise computation into their consumers if possible. Applying it to the
+following TensorIR:
+
+```python
+@tvm.script.tir
+def example_func(...):
+  for i, j in ...:
+    with tir.Block("B") ...:
+      B[i, j] = A[i, j] + 1
+  for i, j in ...:
+    with tir.Block("C") ...:
+      C[i, j] = B[i, j] + 1
+  for i, j in ...:
+    with tir.Block("D") ...:
+      D[i, j] = C[i, j] + 1
+
+sch = tir.Schedule(example_func)
+# `InlineElementwiseOperation` is a composite schedule rule that analyzes a given block.
+# If the block contains only elementwise computation, and can be inlined into its consumer,
+# then `sch.compute_inline` is called on that block.
+inliner = InlineElementwiseOperation()
+inliner.apply(schedule=sch, block=sch.get_block("B"))
+inliner.apply(schedule=sch, block=sch.get_block("C"))
+inliner.apply(schedule=sch, block=sch.get_block("D"))
+```
+
+Below is the result after applying this composite schedule, and its corresponding trace:
+
+```python
+
+>>> print(tvm.script.asscript(sch.mod))
+@tvm.script.tir
+def example_func(...):
+  for i, j in ...:
+    with tir.Block("D") ...:
+      D[i, j] = A[i, j] + 1 + 1 + 1
+
+>>> print(sch.trace)
+# Block "B" is elementwise and inlinable, then `sch.compute_inline(B)` is called
+B = sch.get_block("B")
+sch.compute_inline(B)
+# Block "C" is elementwise and inlinable, then `sch.compute_inline(C)` is called
+C = sch.get_block("C")
+sch.compute_inline(C)
+# Block "D" is elementwise but does not have a consumer,
+# so the rule does not call `compute_inline` because it is not inlinable
+D = sch.get_block("D")
+```
+
+### 3.3. AutoTVM-style Design Space Description
+
+Meta schedule extends the schedule DSL with a set of new schedule primitives
+called **sampling instructions**. These primitives do not transform the TensorIR,
+but instead introduce random statistical variables which can be referenced later in scheduling
+to parameterize the schedule. Incorporating **sampling instructions** into a operator's schedule
+allows the backend developers to succinctly describe a design space in terms of
+tiling strategies, fusion levels, unroll lengths, etc.
+
+The matmul example above is extended to cover all possible tilings using these sampling
+instructions:
+
+```python
+# Sample tile sizes
+i_tiles = sch.sample_perfect_tile(i, n=4)  # was: i_tiles = [16, 8, 8, 8]
+j_tiles = sch.sample_perfect_tile(j, n=4)  # was: j_tiles = [16, 8, 8, 8]
+k_tiles = sch.sample_perfect_tile(k, n=2)  # was: k_tiles = [256, 8]
+# Tile the loops according to the random variables
+i_0, i_1, i_2, i_3 = sch.split(loop=i, factors=i_tiles)
+j_0, j_1, j_2, j_3 = sch.split(loop=j, factors=j_tiles)
+k_0, k_1           = sch.split(loop=k, factors=k_tiles)
+# Organize the loops into "SSRSRS" 6-level tiles
+sch.reorder(
+    i_0, j_0, # S: the 1st spatial tile
+    i_1, j_1, # S: the 2nd spatial tile
+    k_0,      # R: the 1st reduction tile
+    i_2, j_2, # S: the 3rd spatial tile
+    k_1,      # R: the 2nd reduction tile
+    i_3, j_3, # S: the 4th spatial tile
+)
+```
+
+### 3.4. AutoScheduler-style Design Space Generation
+
+To generate design space, AutoScheduler applies a set of rules to each
+[TE stage](https://tvm.apache.org/docs/api/python/te.html#tvm.te.Stage) that corresponds to a
+[TE operation](https://tvm.apache.org/docs/api/doxygen/classtvm_1_1te_1_1Operation.html),
+defined by [`te.compute(...)`](https://tvm.apache.org/docs/api/python/te.html#tvm.te.compute).
+The rules analyze the TE operations and apply an [internal DSL]() to manipulating its internal IR,
+which is in the end mapped to TE schedule primitives. This process is called *sketch generation*.
+
+Composite schedule rules work in a similar way scheduling TensorIR, as introduced in Section 3.2.
+It analyzes the TensorIR and apply schedule primitives directly to TensorIR accordingly.
+When applying such rules to each TensorIR block in certain order (Post-DFS is provided as the
+builtin order, but customization is allowed),
+it generates a sequence of schedule primitives.
+This process corresponds to the *sketch generation* phase in AutoScheduler.
+If sampling instructions are present in this sequence,
+then a design space is defined by those instructions for the meta schedule to explore.
+This process is similar to the *random annotation* phase in AutoScheduler.
+
+Several built-in composite schedule rules are shipped with meta schedule to align with the design
+space generated by AutoScheduler:
+
+* Multi-level tiling
+* Inline pure spatial blocks
+* Parallelize & vectorize & unroll
+* Auto tensorize
+
+Developers may implement their own rules in either Python or C++. They may specify which rules to
+use with the following syntax:
+
+```python
+from tvm import meta_schedule as ms
+
+design_space_generator = ms.PostOrderApply(rules=[
+    ms.MultiLevelTiling(...),
+    CustomRule(...),
+    ms.OtherBuiltinRules(...),
+])
+
+```
+
+### 3.5. Unifying manual schedule / AutoTVM / AutoScheduler
+
+This subsection shows that the design space induced by TE manual schedule, AutoTVM and AutoScheduler
+are all subsets of meta schedule, and meta schedule further allows mixing those three styles to
+search jointly.
+
+- **Manual schedule**. The TE schedule is a special case of a meta schedule program, where there is no
+randomness introduced by sampling instructions. It is a single point in terms of design space.
+- **AutoTVM (Template-based tuning)**. It is more natural representation of AutoTVM’s schedule
+templates (knobs) by writing one or more schedule functions in meta schedule with sampling
+instructions. The probability space supported by the sampling instructions is the design space to
+be explored.
+- **AutoScheduler (Template-free tuning)**. As mentioned in the previous section, application
+  of composite schedule rules generates the design space, which is equivalent to AutoScheduler’s
+  sketch generation.
+- **Mixing styles in design space definition**. By taking union of the spaces induced by the three
+  special cases, meta schedule allows developers to combine generic rules that AutoScheduler
+  provides and operator-specific scheduling.
+
+## 4. Reference-level explanation
+
+This section introduces the underlying techniques for the automation system to extract and
+explore the design space. The figure below briefly illustrates the workflow of the system:
+
+![meta-schedule-workflow](../resources/meta-schedule-workflow.png)
+
+### 4.1. Execution trace as the design space
+
+**Trace**. To represent the design space defined by the meta schedule DSL, the underlying system
+records all the instructions users applied to the schedule class, including sampling and schedule
+primitives. This list of scheduling instructions being invoked, along with the random decisions made
+on sampling instructions, is called a trace.
+
+For instance, executing the example above results in the following trace:
+
+```
+Instruction 0. Sample-Perfect-Tile
+Instruction 1. Sample-Perfect-Tile
+Instruction 2. Sample-Perfect-Tile
+Instruction 3. Split
+Instruction 4. Split
+Instruction 5. Split
+Instruction 6. Reorder
+```
+
+**Trace forms design space.** A trace may contain zero or more sampling instructions, which
+introduces the uncertainty in scheduling - one instance of sampling results in one point in the
+design space. Therefore, the trace itself forms a design space to explore, e.g. which set of tile
+sizes works best on a specific hardware.
+
+**Union of design space**. Meta schedule works on a set of traces, representing the union of the design
+spaces represented by every single trace.
+
+**Fork a trace**. When two different decisions in the scheduling process are equally important to
+generate high-performance schedules, it is allowed to fork the trace into two, and the design space is
+the union of the forked traces.
+
+The trace is not strictly user-facing, but can be accessed and printed with the following syntax:

Review comment:
       if this isn't user-facing, how is the user supposed to invoke Meta Scheduler in a repeatable way? is there a serialization mechanism provided for the implementation (in particular `decisions` dict below)?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tvm-rfcs] junrushao1994 commented on a change in pull request #5: [RFC] Meta Schedule (AutoTIR)

Posted by GitBox <gi...@apache.org>.
junrushao1994 commented on a change in pull request #5:
URL: https://github.com/apache/tvm-rfcs/pull/5#discussion_r661787385



##########
File path: rfcs/0001-meta-schedule-autotensorir.md
##########
@@ -0,0 +1,297 @@
+* Feature Name: Meta Schedule (AutoTensorIR)
+* Start Date: 2021-05-28
+* RFC PR: TBD (apache/tvm-rfcs#0000)
+* GitHub Issue: TBD (apache/tvm-rfcs#0000)
+
+## 1. Summary
+
+This proposal introduces Meta Schedule: a probabilistic scheduling DSL on TIR that unifies the approaches of AutoTVM and Auto Scheduler (Ansor). Meta schedule provides a pragmatic way to define the space of automatic tuning, extensibility in terms of all possible TIR schedule primitives like tensorization and loop partitioning, and customizability on every layer of the automation system.
+
+Meta Schedule is our 3rd generation automatic scheduling system.
+
+## 2. Motivation
+
+**Scheduling and Design Space**
+
+In TVM TensorIR, optimization of a TensorIR program is done via a sequence of transformations. For example, we reorder loops for better locality and we tensorize for specific hardware intrinsics. The process of invoking such a set of pre-defined transformations is called “**scheduling**”, and each transformation is called a “**schedule primitive**”. These primitives form a domain-specific language (DSL) describing the transformation of TensorIR programs. **Design space** is the set of all possible schedulings with respect to a TensorIR program.
+
+**Problems with the Current Scheduling System**
+
+* **Manual schedule**: Developers optimize their programs by manually invoking schedule primitives, i.e. explore points in the design space with humans in the loop. This can be a tedious and error-prone approach, hence the creation of AutoTVM and AutoScheduler (Ansor).
+* **AutoTVM**: The automation system requires users to define “schedule templates” as the design space for each operator. Therefore, it is inextensible to hundreds of operators.
+* **AutoScheduler (Ansor)**: It automatically generates schedule templates as the design space, according to a set of predefined “search rules”. However, it is non-trivial to extend AutoScheduler to new schedule primitives (tensorize, loop partition, software pipelining).
+* The three systems above have isolated sets of APIs with several layers of their own abstraction, which are not only hard to learn, but also engineering-intensive to customize.
+
+**Benefit of Meta Schedule**
+
+* Succinct syntax, consistent APIs to TensorIR schedule with no other layer of abstraction.
+* Provides unified APIs for implementing manual schedule, AutoTVM and AutoScheduler (Ansor).
+* Extensibility to all the schedule primitives, including tensorization and loop partitioning. Almost no extra effort is needed to use a new primitive in auto-tuning.
+* The automation infrastructure is customizable across every layer.
+
+## 3. Guide-level explanation
+
+In this section, we describe the syntax of meta schedule DSL, and how it could be used to describe and auto-generate the design space.
+
+### 3.1. Manual Schedule
+
+Meta schedule APIs are almost the same as TE or TensorIR scheduling. Here is an example of a manual schedule for matrix multiplication:
+
+```python
+# Designate a set of tile sizes
+i_tiles = [16, 8, 8, 8]
+j_tiles = [16, 8, 8, 8]
+k_tiles = [256, 8]
+
+# Tile the loops according to the tile sizes
+i_0, i_1, i_2, i_3 = sch.split(loop=i, factors=i_tiles)
+j_0, j_1, j_2, j_3 = sch.split(loop=j, factors=j_tiles)
+k_0, k_1           = sch.split(loop=k, factors=k_tiles)
+
+# Organize the loops into “SSRSRS” 6-level tiles
+sch.reorder(
+    i_0, j_0, # S
+    i_1, j_1, # S
+    k_0,      # R
+    i_2, j_2, # S
+    k_1,      # R
+    i_3, j_3, # S
+)
+```
+
+In this example, the developers may tweak the tile sizes and measure the performance of the generated kernels to explore the opportunities of potential optimization.
+
+Generally speaking, while writing a schedule, there are often some parameters that are hard to determine ahead of time, for example, tile sizes, unroll steps, or which tensor intrinsics to use. Developers may manually enumerate possible combinations of these unknown factors, and then pick the best schedule according to measurement results on their device.
+
+### 3.2. AutoTVM-style Design Space Description
+
+Meta schedule extends the schedule DSL with sampling instructions. When included in a schedule, these instructions parametrize the schedule from a single deterministic point to a space supported by random variables (tile size, etc.), making it possible for developers to describe the design space with meta schedule APIs.
+
+We can extend the matmul example above to cover all possible tilings using these sampling instructions:
+
+```python
+# Sample tile sizes
+i_tiles = sch.sample_perfect_tile(i, n=4)
+j_tiles = sch.sample_perfect_tile(j, n=4)
+k_tiles = sch.sample_perfect_tile(k, n=2)
+# Tile the loops according to the random variables
+i_0, i_1, i_2, i_3 = sch.split(loop=i, factors=i_tiles)
+j_0, j_1, j_2, j_3 = sch.split(loop=j, factors=j_tiles)
+k_0, k_1           = sch.split(loop=k, factors=k_tiles)
+# Organize the loops into “SSRSRS” 6-level tiles
+sch.reorder(
+    i_0, j_0, # S
+    i_1, j_1, # S
+    k_0,      # R
+    i_2, j_2, # S
+    k_1,      # R
+    i_3, j_3, # S
+)
+```
+
+### 3.3. Composite Schedule
+
+Each schedule primitive handles only a very basic operation to transform the IR, for example, `split` only splits a loop into two. In the real world, the over-fine granularity of those primitives usually leads to repetitive and verbose scheduling code, as [mentioned](https://discuss.tvm.apache.org/t/rfc-tensorir-a-schedulable-ir-for-tvm/7872/43?u=junrushao1994) by developers in our community.
+
+To counter this challenge, we allow users to register “composite schedules” that analyze the IR, and apply a set of schedule primitives correspondingly. For instance, a composite schedule may inspect a TensorIR block and decide whether we should call `compute_inline` on it. The composite schedule may use sampling instructions to fill in undecided choices.
+
+Our system also ships with some built-in composite schedules, including:
+
+* Multi-level tiling
+* Inline pure spatial blocks
+* Parallelize & vectorize & unroll
+* Auto tensorize
+* …
+
+### 3.4. AutoScheduler-style Design Space Generation
+
+AutoScheduler (Ansor) generates schedule templates by applying their SearchRules to each stage. Meta schedule treats a search rule as a composite schedule, and applies each composite schedule to each block of TensorIR to generate the design space.
+
+### 3.5. Unifying manual schedule / AutoTVM / Ansor
+
+In this section, we show that the design space induced by TE manual schedule, AutoTVM and Ansor are all subsets of meta schedule, and meta schedule further allows mixing those three styles to search jointly.
+
+**Manual schedule**. The TE schedule is a special case of a meta schedule program, where there is no randomness introduced by sampling instructions. It is a single point in terms of design space.
+
+**AutoTVM (Template-based tuning)**. Writing one or more schedule functions in meta schedule, potentially with sampling instructions, is a natural representation of AutoTVM’s schedule templates (knobs). The PPL generates one or more traces as the design space to explore.
+
+**AutoScheduler (Ansor, Template-free tuning)**. As mentioned in the previous section, application of composite schedule rules generates the design space, which is equivalent to Ansor’s sketch generation.
+
+**Mixing styles in design space definition**. By taking union of the spaces induced by the three special cases, our system allows developers to combine generic rules that Ansor provides and operator-specific scheduling.
+
+## 4. Reference-level explanation
+
+In this section, we introduce the underlying techniques for the automation system to extract and explore the design space.
+
+### 4.1. Execution trace as the design space
+
+**Trace**. To represent the design space defined by the meta schedule DSL, the underlying system records all the instructions users applied to the schedule class, including sampling and schedule primitives. We call this list of instructions a trace.
+
+Executing the example above results in the following trace:
+
+```
+Instruction 0. Sample-Perfect-Tile
+Instruction 1. Sample-Perfect-Tile
+Instruction 2. Sample-Perfect-Tile
+Instruction 3. Split
+Instruction 4. Split
+Instruction 5. Split
+Instruction 6. Reorder
+```
+
+The trace is not directly user-facing, but a data structure inside the user-facing `Schedule` class that records the execution. The automation system extracts the trace and finds out the design space according to the sampling instructions.
+
+**Union of traces**. Often a single trace is unable to represent the entire space. Therefore, more precisely, our system works on a list of traces as the union of potential design space.

Review comment:
       I updated the section by adding a paragraph "trace forms design space", updated the "union of traces" to "union of design space" and refined the description. Hopefully it improves!
   
   ```markdown
   **Trace forms design space.** A trace may contain zero or more sampling instructions, which introduces the uncertainty in scheduling - one instance of sampling results in one point in the design space. Therefore, the trace itself forms a design space to explore, e.g. which set of tile sizes works best on a specific hardware.
   
   **Union of design space**. Our system works on a set of traces, representing the union of the design spaces represented by every single trace.
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tvm-rfcs] junrushao1994 commented on a change in pull request #5: [RFC] Meta Schedule (AutoTIR)

Posted by GitBox <gi...@apache.org>.
junrushao1994 commented on a change in pull request #5:
URL: https://github.com/apache/tvm-rfcs/pull/5#discussion_r661760257



##########
File path: rfcs/0001-meta-schedule-autotensorir.md
##########
@@ -0,0 +1,297 @@
+* Feature Name: Meta Schedule (AutoTensorIR)
+* Start Date: 2021-05-28
+* RFC PR: TBD (apache/tvm-rfcs#0000)
+* GitHub Issue: TBD (apache/tvm-rfcs#0000)
+
+## 1. Summary
+
+This proposal introduces Meta Schedule: a probabilistic scheduling DSL on TIR that unifies the approaches of AutoTVM and Auto Scheduler (Ansor). Meta schedule provides a pragmatic way to define the space of automatic tuning, extensibility in terms of all possible TIR schedule primitives like tensorization and loop partitioning, and customizability on every layer of the automation system.
+
+Meta Schedule is our 3rd generation automatic scheduling system.
+
+## 2. Motivation
+
+**Scheduling and Design Space**
+
+In TVM TensorIR, optimization of a TensorIR program is done via a sequence of transformations. For example, we reorder loops for better locality and we tensorize for specific hardware intrinsics. The process of invoking such a set of pre-defined transformations is called “**scheduling**”, and each transformation is called a “**schedule primitive**”. These primitives form a domain-specific language (DSL) describing the transformation of TensorIR programs. **Design space** is the set of all possible schedulings with respect to a TensorIR program.
+
+**Problems with the Current Scheduling System**
+
+* **Manual schedule**: Developers optimize their programs by manually invoking schedule primitives, i.e. explore points in the design space with humans in the loop. This can be a tedious and error-prone approach, hence the creation of AutoTVM and AutoScheduler (Ansor).
+* **AutoTVM**: The automation system requires users to define “schedule templates” as the design space for each operator. Therefore, it is inextensible to hundreds of operators.
+* **AutoScheduler (Ansor)**: It automatically generates schedule templates as the design space, according to a set of predefined “search rules”. However, it is non-trivial to extend AutoScheduler to new schedule primitives (tensorize, loop partition, software pipelining).
+* The three systems above have isolated sets of APIs with several layers of their own abstraction, which are not only hard to learn, but also engineering-intensive to customize.
+
+**Benefit of Meta Schedule**
+
+* Succinct syntax, consistent APIs to TensorIR schedule with no other layer of abstraction.
+* Provides unified APIs for implementing manual schedule, AutoTVM and AutoScheduler (Ansor).
+* Extensibility to all the schedule primitives, including tensorization and loop partitioning. Almost no extra effort is needed to use a new primitive in auto-tuning.
+* The automation infrastructure is customizable across every layer.
+
+## 3. Guide-level explanation
+
+In this section, we describe the syntax of meta schedule DSL, and how it could be used to describe and auto-generate the design space.
+
+### 3.1. Manual Schedule
+
+Meta schedule APIs are almost the same as TE or TensorIR scheduling. Here is an example of a manual schedule for matrix multiplication:
+
+```python
+# Designate a set of tile sizes
+i_tiles = [16, 8, 8, 8]
+j_tiles = [16, 8, 8, 8]
+k_tiles = [256, 8]
+
+# Tile the loops according to the tile sizes
+i_0, i_1, i_2, i_3 = sch.split(loop=i, factors=i_tiles)
+j_0, j_1, j_2, j_3 = sch.split(loop=j, factors=j_tiles)
+k_0, k_1           = sch.split(loop=k, factors=k_tiles)
+
+# Organize the loops into “SSRSRS” 6-level tiles
+sch.reorder(
+    i_0, j_0, # S
+    i_1, j_1, # S
+    k_0,      # R
+    i_2, j_2, # S
+    k_1,      # R
+    i_3, j_3, # S
+)
+```
+
+In this example, the developers may tweak the tile sizes and measure the performance of the generated kernels to explore the opportunities of potential optimization.
+
+Generally speaking, while writing a schedule, there are often some parameters that are hard to determine ahead of time, for example, tile sizes, unroll steps, or which tensor intrinsics to use. Developers may manually enumerate possible combinations of these unknown factors, and then pick the best schedule according to measurement results on their device.
+
+### 3.2. AutoTVM-style Design Space Description
+
+Meta schedule extends the schedule DSL with sampling instructions. When included in a schedule, these instructions parametrize the schedule from a single deterministic point to a space supported by random variables (tile size, etc.), making it possible for developers to describe the design space with meta schedule APIs.
+
+We can extend the matmul example above to cover all possible tilings using these sampling instructions:
+
+```python
+# Sample tile sizes
+i_tiles = sch.sample_perfect_tile(i, n=4)
+j_tiles = sch.sample_perfect_tile(j, n=4)
+k_tiles = sch.sample_perfect_tile(k, n=2)
+# Tile the loops according to the random variables
+i_0, i_1, i_2, i_3 = sch.split(loop=i, factors=i_tiles)
+j_0, j_1, j_2, j_3 = sch.split(loop=j, factors=j_tiles)
+k_0, k_1           = sch.split(loop=k, factors=k_tiles)
+# Organize the loops into “SSRSRS” 6-level tiles
+sch.reorder(
+    i_0, j_0, # S
+    i_1, j_1, # S
+    k_0,      # R
+    i_2, j_2, # S
+    k_1,      # R
+    i_3, j_3, # S
+)
+```
+
+### 3.3. Composite Schedule
+
+Each schedule primitive handles only a very basic operation to transform the IR, for example, `split` only splits a loop into two. In the real world, the over-fine granularity of those primitives usually leads to repetitive and verbose scheduling code, as [mentioned](https://discuss.tvm.apache.org/t/rfc-tensorir-a-schedulable-ir-for-tvm/7872/43?u=junrushao1994) by developers in our community.
+
+To counter this challenge, we allow users to register “composite schedules” that analyze the IR, and apply a set of schedule primitives correspondingly. For instance, a composite schedule may inspect a TensorIR block and decide whether we should call `compute_inline` on it. The composite schedule may use sampling instructions to fill in undecided choices.
+
+Our system also ships with some built-in composite schedules, including:
+
+* Multi-level tiling
+* Inline pure spatial blocks
+* Parallelize & vectorize & unroll
+* Auto tensorize
+* …
+
+### 3.4. AutoScheduler-style Design Space Generation
+
+AutoScheduler (Ansor) generates schedule templates by applying their SearchRules to each stage. Meta schedule treats a search rule as a composite schedule, and applies each composite schedule to each block of TensorIR to generate the design space.
+
+### 3.5. Unifying manual schedule / AutoTVM / Ansor
+
+In this section, we show that the design space induced by TE manual schedule, AutoTVM and Ansor are all subsets of meta schedule, and meta schedule further allows mixing those three styles to search jointly.
+
+**Manual schedule**. The TE schedule is a special case of a meta schedule program, where there is no randomness introduced by sampling instructions. It is a single point in terms of design space.
+
+**AutoTVM (Template-based tuning)**. Writing one or more schedule functions in meta schedule, potentially with sampling instructions, is a natural representation of AutoTVM’s schedule templates (knobs). The PPL generates one or more traces as the design space to explore.

Review comment:
       probabilistic programming language. I will remove this term from this RFC completely




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tvm-rfcs] junrushao1994 commented on a change in pull request #5: [RFC] Meta Schedule (AutoTIR)

Posted by GitBox <gi...@apache.org>.
junrushao1994 commented on a change in pull request #5:
URL: https://github.com/apache/tvm-rfcs/pull/5#discussion_r668387903



##########
File path: rfcs/0001-meta-schedule-autotensorir.md
##########
@@ -0,0 +1,444 @@
+<!--- Licensed to the Apache Software Foundation (ASF) under one -->
+<!--- or more contributor license agreements.  See the NOTICE file -->
+<!--- distributed with this work for additional information -->
+<!--- regarding copyright ownership.  The ASF licenses this file -->
+<!--- to you under the Apache License, Version 2.0 (the -->
+<!--- "License"); you may not use this file except in compliance -->
+<!--- with the License.  You may obtain a copy of the License at -->
+
+<!---   http://www.apache.org/licenses/LICENSE-2.0 -->
+
+<!--- Unless required by applicable law or agreed to in writing, -->
+<!--- software distributed under the License is distributed on an -->
+<!--- "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY -->
+<!--- KIND, either express or implied.  See the License for the -->
+<!--- specific language governing permissions and limitations -->
+<!--- under the License. -->
+
+* Feature Name: Meta Schedule (Formerly AutoTIR)
+* Start Date: 2021-05-28
+* RFC PR: TBD (apache/tvm-rfcs#0000)
+* GitHub Issue: TBD (apache/tvm-rfcs#0000)
+
+## 1. Summary
+
+This proposal introduces Meta Schedule: a probabilistic scheduling DSL on TIR that unifies the
+approaches of AutoTVM and Auto Scheduler (Ansor). Meta schedule provides a pragmatic way to define
+the space of automatic tuning, extensibility in terms of all possible TIR schedule primitives like
+tensorization and loop partitioning, and customizability on every layer of the automation system.
+
+Meta Schedule is our 3rd generation automatic scheduling system.
+
+## 2. Motivation
+
+**Scheduling and design space.** In TVM TensorIR, optimization of a TensorIR program is done via a
+sequence of transformations. For example, we reorder loops for better locality and we tensorize for
+specific hardware intrinsics. The process of invoking such a set of pre-defined transformations is
+called "**scheduling**", and each transformation is called a "**schedule primitive**". These
+primitives form a domain-specific language (DSL) describing the transformation of TensorIR programs.
+**Design space** is the set of all possible schedulings with respect to a TensorIR program.
+
+**Problems with the current scheduling system.** Currently we have 3 sets of scheduling APIs:
+* **Manual schedule**: Developers optimize their programs by manually invoking schedule primitives,
+  i.e. explore points in the design space with humans in the loop. This can be a tedious and
+  error-prone approach, hence the creation of AutoTVM and AutoScheduler (Ansor).
+* **AutoTVM**: The automation system requires users to define "schedule templates" as the design
+  space for each operator. Therefore, it is inextensible to hundreds of operators and variety
+  hardware platforms.
+* **AutoScheduler (Ansor)**: It automatically generates schedule templates as the design space,
+  according to a set of predefined "search rules". However, it is non-trivial to extend
+  AutoScheduler to new schedule primitives (tensorize, loop partition, software pipelining, etc).
+* The three systems above have isolated sets of APIs with several layers of their own abstraction,
+  which are not only hard to learn, but also engineering-intensive to customize.
+
+**Benefits of Meta Schedule.**  Meta schedule provides:
+* Succinct syntax, consistent APIs to TensorIR schedule with no other layer of abstraction.
+* Unified APIs for implementing manual schedules, AutoTVM-style schedules, and AutoScheduler-style
+  schedules.
+* Extensibility of all the schedule primitives, including tensorization and loop partitioning.
+  Almost no extra effort is needed to use a new primitive in auto-tuning.
+* The automation infrastructure is extensible on every of its components. Every component of the
+  system can be customized easily in pure python or C++ or both. For example, one can develop a new
+  design space generator in python, a new ProgramRunner in python, etc.
+
+
+## 3. Guide-level explanation
+
+In this section, we describe the syntax of meta schedule DSL, and how it could be used to describe
+and auto-generate the design space.
+
+### 3.1. Manual Schedule
+
+Meta schedule APIs are almost the same as TE or TensorIR scheduling. Here is an example of a manual
+schedule for matrix multiplication:
+
+```python
+# Designate a set of tile sizes
+i_tiles = [16, 8, 8, 8]
+j_tiles = [16, 8, 8, 8]
+k_tiles = [256, 8]
+
+# Tile the loops according to the tile sizes
+i_0, i_1, i_2, i_3 = sch.split(loop=i, factors=i_tiles)
+j_0, j_1, j_2, j_3 = sch.split(loop=j, factors=j_tiles)
+k_0, k_1           = sch.split(loop=k, factors=k_tiles)
+
+# Organize the loops into "SSRSRS" 6-level tiles
+sch.reorder(
+    i_0, j_0, # S: the 1st spatial tile
+    i_1, j_1, # S: the 2nd spatial tile
+    k_0,      # R: the 1st reduction tile
+    i_2, j_2, # S: the 3rd spatial tile
+    k_1,      # R: the 2nd reduction tile
+    i_3, j_3, # S: the 4th spatial tile
+)
+```
+
+In this example, the developers may tweak the tile sizes and measure the performance of the
+generated kernels to explore the opportunities of potential optimization.
+
+Generally speaking, while writing a schedule, there are often some parameters that are hard to
+determine ahead of time, for example, tile sizes, unroll steps, or which tensor intrinsics to use.
+Developers may manually enumerate possible combinations of these unknown factors, and then pick the
+best schedule according to measurement results on their device.
+
+### 3.2. Composite Schedule Rules
+
+As introduced in the previous section, in TensorIR, each schedule primitive handles only a very
+basic transformation of the IR. For example, `split` only splits a loop into two new loops. In the
+real world, the over-fine granularity of those primitives usually leads to repetitive and verbose
+scheduling code, as
+[mentioned](https://discuss.tvm.apache.org/t/rfc-tensorir-a-schedulable-ir-for-tvm/7872/43?u=junrushao1994)
+by developers in our community.
+
+To make it more convenient and modular, we allow users to register "composite schedules" that apply
+a sequence of schedule primitives according to certain analysis of the IR. For instance, a composite
+schedule may inspect a TensorIR block and decide whether we should call `compute_inline` on it.
+
+### 3.3. AutoTVM-style Design Space Description
+
+Meta schedule extends the schedule DSL with sampling instructions. When included in a schedule,
+these instructions parametrize the schedule from a single deterministic point to a space supported
+by random variables (tile size, etc.), making it possible for developers to describe the design
+space with meta schedule APIs.
+
+We can extend the matmul example above to cover all possible tilings using these sampling
+instructions:
+
+```python
+# Sample tile sizes
+i_tiles = sch.sample_perfect_tile(i, n=4)
+j_tiles = sch.sample_perfect_tile(j, n=4)
+k_tiles = sch.sample_perfect_tile(k, n=2)
+# Tile the loops according to the random variables
+i_0, i_1, i_2, i_3 = sch.split(loop=i, factors=i_tiles)
+j_0, j_1, j_2, j_3 = sch.split(loop=j, factors=j_tiles)
+k_0, k_1           = sch.split(loop=k, factors=k_tiles)
+# Organize the loops into "SSRSRS" 6-level tiles
+sch.reorder(
+    i_0, j_0, # S: the 1st spatial tile
+    i_1, j_1, # S: the 2nd spatial tile
+    k_0,      # R: the 1st reduction tile
+    i_2, j_2, # S: the 3rd spatial tile
+    k_1,      # R: the 2nd reduction tile
+    i_3, j_3, # S: the 4th spatial tile
+)
+```
+
+### 3.4. AutoScheduler-style Design Space Generation
+
+AutoScheduler (Ansor) generates schedule templates by applying their SearchRules to each stage.
+SearchRule analyzes TE and eagerly trigger schedule primitives accordingly in its internally
+maintained mini IR.
+
+As introduced in Section 3.2, composite schedule rules are equivalent to AutoScheduler's SearchRule
+in TensorIR scheduling. To further generate a design space for scheduling, sampling instructions are
+used in composite schedule rules. Similarly, the sketch generation phase in AutoScheduler is
+equivalent to applying composite schedule rules to each block in TensorIR.
+
+Several built-in composite schedule rules are shipped with our system to align with Ansor's design
+space:
+
+* Multi-level tiling
+* Inline pure spatial blocks
+* Parallelize & vectorize & unroll
+* Auto tensorize
+
+Developers may implement their own rules in either Python or C++. They may specify which rules to
+use with the following syntax:
+
+```python
+from tvm import meta_schedule as ms
+
+design_space_generator = ms.PostOrderApply(rules=[
+    ms.MultiLevelTiling(...),
+    CustomRule(...),
+    ms.OtherBuiltinRules(...),
+])
+
+```
+
+### 3.5. Unifying manual schedule / AutoTVM / Ansor
+
+In this section, we show that the design space induced by TE manual schedule, AutoTVM and Ansor are
+all subsets of meta schedule, and meta schedule further allows mixing those three styles to search
+jointly.
+
+**Manual schedule**. The TE schedule is a special case of a meta schedule program, where there is no
+randomness introduced by sampling instructions. It is a single point in terms of design space.
+
+**AutoTVM (Template-based tuning)**. It is more natural representation of AutoTVM’s schedule
+templates (knobs) by writing one or more schedule functions in meta schedule with sampling
+instructions. The execution traces generated by the schedule functions are the design space to be
+explored.
+
+**AutoScheduler (Ansor, Template-free tuning)**. As mentioned in the previous section, application
+of composite schedule rules generates the design space, which is equivalent to Ansor’s sketch
+generation.
+
+**Mixing styles in design space definition**. By taking union of the spaces induced by the three
+special cases, our system allows developers to combine generic rules that Ansor provides and
+operator-specific scheduling.
+
+## 4. Reference-level explanation
+
+In this section, we introduce the underlying techniques for the automation system to extract and
+explore the design space. The figure below briefly illustrates the workflow of the system:
+
+![meta-schedule-workflow](../resources/meta-schedule-workflow.png)
+
+### 4.1. Execution trace as the design space
+
+**Trace**. To represent the design space defined by the meta schedule DSL, the underlying system
+records all the instructions users applied to the schedule class, including sampling and schedule
+primitives. We call this list of instructions a trace.
+
+For instance, executing the example above results in the following trace:
+
+```
+Instruction 0. Sample-Perfect-Tile
+Instruction 1. Sample-Perfect-Tile
+Instruction 2. Sample-Perfect-Tile
+Instruction 3. Split
+Instruction 4. Split
+Instruction 5. Split
+Instruction 6. Reorder
+```
+
+**Trace forms design space.** A trace may contain zero or more sampling instructions, which
+introduces the uncertainty in scheduling - one instance of sampling results in one point in the
+design space. Therefore, the trace itself forms a design space to explore, e.g. which set of tile
+sizes works best on a specific hardware.
+
+**Union of design space**. Our system works on a set of traces, representing the union of the design
+spaces represented by every single trace.
+
+**Fork a trace**. When two different decisions in the scheduling process are equally important to

Review comment:
       For example, in a `conv2d-relu` subgraph, when fusing relu into the loop nests of conv2d, we need to decide the fusion level, i.e. should i compute relu on the second tile or the third one? In this case, these decisions are equally important




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tvm-rfcs] junrushao1994 commented on a change in pull request #5: [RFC] Meta Schedule (AutoTIR)

Posted by GitBox <gi...@apache.org>.
junrushao1994 commented on a change in pull request #5:
URL: https://github.com/apache/tvm-rfcs/pull/5#discussion_r668438352



##########
File path: rfcs/0001-meta-schedule-autotensorir.md
##########
@@ -0,0 +1,444 @@
+<!--- Licensed to the Apache Software Foundation (ASF) under one -->
+<!--- or more contributor license agreements.  See the NOTICE file -->
+<!--- distributed with this work for additional information -->
+<!--- regarding copyright ownership.  The ASF licenses this file -->
+<!--- to you under the Apache License, Version 2.0 (the -->
+<!--- "License"); you may not use this file except in compliance -->
+<!--- with the License.  You may obtain a copy of the License at -->
+
+<!---   http://www.apache.org/licenses/LICENSE-2.0 -->
+
+<!--- Unless required by applicable law or agreed to in writing, -->
+<!--- software distributed under the License is distributed on an -->
+<!--- "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY -->
+<!--- KIND, either express or implied.  See the License for the -->
+<!--- specific language governing permissions and limitations -->
+<!--- under the License. -->
+
+* Feature Name: Meta Schedule (Formerly AutoTIR)
+* Start Date: 2021-05-28
+* RFC PR: TBD (apache/tvm-rfcs#0000)
+* GitHub Issue: TBD (apache/tvm-rfcs#0000)
+
+## 1. Summary
+
+This proposal introduces Meta Schedule: a probabilistic scheduling DSL on TIR that unifies the
+approaches of AutoTVM and Auto Scheduler (Ansor). Meta schedule provides a pragmatic way to define
+the space of automatic tuning, extensibility in terms of all possible TIR schedule primitives like
+tensorization and loop partitioning, and customizability on every layer of the automation system.
+
+Meta Schedule is our 3rd generation automatic scheduling system.
+
+## 2. Motivation
+
+**Scheduling and design space.** In TVM TensorIR, optimization of a TensorIR program is done via a
+sequence of transformations. For example, we reorder loops for better locality and we tensorize for
+specific hardware intrinsics. The process of invoking such a set of pre-defined transformations is
+called "**scheduling**", and each transformation is called a "**schedule primitive**". These
+primitives form a domain-specific language (DSL) describing the transformation of TensorIR programs.
+**Design space** is the set of all possible schedulings with respect to a TensorIR program.
+
+**Problems with the current scheduling system.** Currently we have 3 sets of scheduling APIs:
+* **Manual schedule**: Developers optimize their programs by manually invoking schedule primitives,
+  i.e. explore points in the design space with humans in the loop. This can be a tedious and
+  error-prone approach, hence the creation of AutoTVM and AutoScheduler (Ansor).
+* **AutoTVM**: The automation system requires users to define "schedule templates" as the design
+  space for each operator. Therefore, it is inextensible to hundreds of operators and variety
+  hardware platforms.
+* **AutoScheduler (Ansor)**: It automatically generates schedule templates as the design space,
+  according to a set of predefined "search rules". However, it is non-trivial to extend
+  AutoScheduler to new schedule primitives (tensorize, loop partition, software pipelining, etc).
+* The three systems above have isolated sets of APIs with several layers of their own abstraction,
+  which are not only hard to learn, but also engineering-intensive to customize.
+
+**Benefits of Meta Schedule.**  Meta schedule provides:
+* Succinct syntax, consistent APIs to TensorIR schedule with no other layer of abstraction.
+* Unified APIs for implementing manual schedules, AutoTVM-style schedules, and AutoScheduler-style
+  schedules.
+* Extensibility of all the schedule primitives, including tensorization and loop partitioning.
+  Almost no extra effort is needed to use a new primitive in auto-tuning.
+* The automation infrastructure is extensible on every of its components. Every component of the
+  system can be customized easily in pure python or C++ or both. For example, one can develop a new
+  design space generator in python, a new ProgramRunner in python, etc.
+
+
+## 3. Guide-level explanation
+
+In this section, we describe the syntax of meta schedule DSL, and how it could be used to describe
+and auto-generate the design space.
+
+### 3.1. Manual Schedule
+
+Meta schedule APIs are almost the same as TE or TensorIR scheduling. Here is an example of a manual
+schedule for matrix multiplication:
+
+```python
+# Designate a set of tile sizes
+i_tiles = [16, 8, 8, 8]
+j_tiles = [16, 8, 8, 8]
+k_tiles = [256, 8]
+
+# Tile the loops according to the tile sizes
+i_0, i_1, i_2, i_3 = sch.split(loop=i, factors=i_tiles)
+j_0, j_1, j_2, j_3 = sch.split(loop=j, factors=j_tiles)
+k_0, k_1           = sch.split(loop=k, factors=k_tiles)
+
+# Organize the loops into "SSRSRS" 6-level tiles
+sch.reorder(
+    i_0, j_0, # S: the 1st spatial tile
+    i_1, j_1, # S: the 2nd spatial tile
+    k_0,      # R: the 1st reduction tile
+    i_2, j_2, # S: the 3rd spatial tile
+    k_1,      # R: the 2nd reduction tile
+    i_3, j_3, # S: the 4th spatial tile
+)
+```
+
+In this example, the developers may tweak the tile sizes and measure the performance of the
+generated kernels to explore the opportunities of potential optimization.
+
+Generally speaking, while writing a schedule, there are often some parameters that are hard to
+determine ahead of time, for example, tile sizes, unroll steps, or which tensor intrinsics to use.
+Developers may manually enumerate possible combinations of these unknown factors, and then pick the
+best schedule according to measurement results on their device.
+
+### 3.2. Composite Schedule Rules
+
+As introduced in the previous section, in TensorIR, each schedule primitive handles only a very
+basic transformation of the IR. For example, `split` only splits a loop into two new loops. In the
+real world, the over-fine granularity of those primitives usually leads to repetitive and verbose
+scheduling code, as
+[mentioned](https://discuss.tvm.apache.org/t/rfc-tensorir-a-schedulable-ir-for-tvm/7872/43?u=junrushao1994)
+by developers in our community.
+
+To make it more convenient and modular, we allow users to register "composite schedules" that apply
+a sequence of schedule primitives according to certain analysis of the IR. For instance, a composite
+schedule may inspect a TensorIR block and decide whether we should call `compute_inline` on it.
+
+### 3.3. AutoTVM-style Design Space Description
+
+Meta schedule extends the schedule DSL with sampling instructions. When included in a schedule,
+these instructions parametrize the schedule from a single deterministic point to a space supported
+by random variables (tile size, etc.), making it possible for developers to describe the design
+space with meta schedule APIs.
+
+We can extend the matmul example above to cover all possible tilings using these sampling
+instructions:
+
+```python
+# Sample tile sizes
+i_tiles = sch.sample_perfect_tile(i, n=4)
+j_tiles = sch.sample_perfect_tile(j, n=4)
+k_tiles = sch.sample_perfect_tile(k, n=2)
+# Tile the loops according to the random variables
+i_0, i_1, i_2, i_3 = sch.split(loop=i, factors=i_tiles)
+j_0, j_1, j_2, j_3 = sch.split(loop=j, factors=j_tiles)
+k_0, k_1           = sch.split(loop=k, factors=k_tiles)
+# Organize the loops into "SSRSRS" 6-level tiles
+sch.reorder(
+    i_0, j_0, # S: the 1st spatial tile
+    i_1, j_1, # S: the 2nd spatial tile
+    k_0,      # R: the 1st reduction tile
+    i_2, j_2, # S: the 3rd spatial tile
+    k_1,      # R: the 2nd reduction tile
+    i_3, j_3, # S: the 4th spatial tile
+)
+```
+
+### 3.4. AutoScheduler-style Design Space Generation
+
+AutoScheduler (Ansor) generates schedule templates by applying their SearchRules to each stage.
+SearchRule analyzes TE and eagerly trigger schedule primitives accordingly in its internally
+maintained mini IR.
+
+As introduced in Section 3.2, composite schedule rules are equivalent to AutoScheduler's SearchRule
+in TensorIR scheduling. To further generate a design space for scheduling, sampling instructions are
+used in composite schedule rules. Similarly, the sketch generation phase in AutoScheduler is
+equivalent to applying composite schedule rules to each block in TensorIR.
+
+Several built-in composite schedule rules are shipped with our system to align with Ansor's design
+space:
+
+* Multi-level tiling
+* Inline pure spatial blocks
+* Parallelize & vectorize & unroll
+* Auto tensorize
+
+Developers may implement their own rules in either Python or C++. They may specify which rules to
+use with the following syntax:
+
+```python
+from tvm import meta_schedule as ms
+
+design_space_generator = ms.PostOrderApply(rules=[
+    ms.MultiLevelTiling(...),
+    CustomRule(...),
+    ms.OtherBuiltinRules(...),
+])
+
+```
+
+### 3.5. Unifying manual schedule / AutoTVM / Ansor
+
+In this section, we show that the design space induced by TE manual schedule, AutoTVM and Ansor are
+all subsets of meta schedule, and meta schedule further allows mixing those three styles to search
+jointly.
+
+**Manual schedule**. The TE schedule is a special case of a meta schedule program, where there is no
+randomness introduced by sampling instructions. It is a single point in terms of design space.
+
+**AutoTVM (Template-based tuning)**. It is more natural representation of AutoTVM’s schedule
+templates (knobs) by writing one or more schedule functions in meta schedule with sampling
+instructions. The execution traces generated by the schedule functions are the design space to be
+explored.
+
+**AutoScheduler (Ansor, Template-free tuning)**. As mentioned in the previous section, application
+of composite schedule rules generates the design space, which is equivalent to Ansor’s sketch
+generation.
+
+**Mixing styles in design space definition**. By taking union of the spaces induced by the three
+special cases, our system allows developers to combine generic rules that Ansor provides and
+operator-specific scheduling.
+
+## 4. Reference-level explanation
+
+In this section, we introduce the underlying techniques for the automation system to extract and
+explore the design space. The figure below briefly illustrates the workflow of the system:
+
+![meta-schedule-workflow](../resources/meta-schedule-workflow.png)
+
+### 4.1. Execution trace as the design space
+
+**Trace**. To represent the design space defined by the meta schedule DSL, the underlying system
+records all the instructions users applied to the schedule class, including sampling and schedule
+primitives. We call this list of instructions a trace.
+
+For instance, executing the example above results in the following trace:
+
+```
+Instruction 0. Sample-Perfect-Tile
+Instruction 1. Sample-Perfect-Tile
+Instruction 2. Sample-Perfect-Tile
+Instruction 3. Split
+Instruction 4. Split
+Instruction 5. Split
+Instruction 6. Reorder
+```
+
+**Trace forms design space.** A trace may contain zero or more sampling instructions, which
+introduces the uncertainty in scheduling - one instance of sampling results in one point in the
+design space. Therefore, the trace itself forms a design space to explore, e.g. which set of tile
+sizes works best on a specific hardware.
+
+**Union of design space**. Our system works on a set of traces, representing the union of the design
+spaces represented by every single trace.
+
+**Fork a trace**. When two different decisions in the scheduling process are equally important to
+generate high-performance schedules, we allow forking the trace into two, and the design space is
+the union of the forked traces.
+
+The trace is not strictly user-facing, but can be accessed and printed with the following syntax:
+
+```python
+# requires to trace the execution
+sch = tir.Schedule(..., traced=True)
+# do a lot of scheduling
+...
+# print the trace
+print(sch.trace)
+```
+
+And below is an example of the printed trace, which honestly reflects the schedule as a snippet of
+python scheduling function:
+
+```python
+b0 = sch.get_block(name="matmul", func_name="main")
+l1, l2, l3 = sch.get_loops(block=b0)
+v4, v5, v6, v7 = sch.sample_perfect_tile(loop=l1, n=4, max_innermost_factor=16, decision=[32, 1, 16, 2])
+v8, v9, v10, v11 = sch.sample_perfect_tile(loop=l2, n=4, max_innermost_factor=16, decision=[64, 4, 2, 2])
+v12, v13 = sch.sample_perfect_tile(loop=l3, n=2, max_innermost_factor=16, decision=[64, 16])
+l14, l15, l16, l17 = sch.split(loop=l1, factors=[v4, v5, v6, v7])
+l18, l19, l20, l21 = sch.split(loop=l2, factors=[v8, v9, v10, v11])
+l22, l23 = sch.split(loop=l3, factors=[v12, v13])
+sch.reorder(l14, l18, l15, l19, l22, l16, l20, l23, l17, l21)
+```
+
+### 4.2. Exploring the Design Space
+
+Meta Schedule provides several built-in exploration strategies to exhaustively or efficiently search
+for efficient schedules.
+
+**Random search by replaying schedule functions.** With a user-provided schedule function
+as a black-box design space generator, our system could repetitively invoke such an opaque function
+without doing any extra analysis. The function could be written in C++ or Python, or any language
+that implements packed function FFI. If sampling instructions are present in the function, then each
+invocation results in a different IRModule after being scheduled because the random decisions are
+possibly changed across different runs. Effectively, it is equivalent to
+random exploration without trace, allowing the flexibility for users to define arbitrary functions
+that trace may not well support (e.g. control flow divergence based on the value of intermediate
+random variables), but it forbids future opportunity of any trace-based analysis.
+
+**Random search by replaying traces.** Traces are obtained from a design space generator, and
+replayed with a builtin interpreter in our system. If sampling instructions are present on the

Review comment:
       Yeah that's correct. I am going to define "replay" as the re-execution of the trace




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tvm-rfcs] areusch commented on a change in pull request #5: [RFC] Meta Schedule (AutoTIR)

Posted by GitBox <gi...@apache.org>.
areusch commented on a change in pull request #5:
URL: https://github.com/apache/tvm-rfcs/pull/5#discussion_r669002230



##########
File path: rfcs/0001-meta-schedule-autotensorir.md
##########
@@ -22,36 +22,48 @@
 
 ## 1. Summary
 
-This proposal introduces Meta Schedule: a probabilistic scheduling DSL on TIR that unifies the
+This proposal introduces Meta Schedule: a scheduling DSL on TIR that unifies the
 approaches of AutoTVM and Auto Scheduler (Ansor). Meta schedule provides a pragmatic way to define
 the space of automatic tuning, extensibility in terms of all possible TIR schedule primitives like
 tensorization and loop partitioning, and customizability on every layer of the automation system.
 
-Meta Schedule is our 3rd generation automatic scheduling system.
+Meta Schedule is the 3rd generation automatic scheduling system.
 
 ## 2. Motivation
 
 **Scheduling and design space.** In TVM TensorIR, optimization of a TensorIR program is done via a
-sequence of transformations. For example, we reorder loops for better locality and we tensorize for
+sequence of transformations. For example, reordering loops for better locality and tensorizing for
 specific hardware intrinsics. The process of invoking such a set of pre-defined transformations is
 called "**scheduling**", and each transformation is called a "**schedule primitive**". These
 primitives form a domain-specific language (DSL) describing the transformation of TensorIR programs.
 **Design space** is the set of all possible schedulings with respect to a TensorIR program.
 
-**Problems with the current scheduling system.** Currently we have 3 sets of scheduling APIs:
+### Problems with the current scheduling system
+
+Currently there are have 3 sets of scheduling APIs in TVM:

Review comment:
       nit: Currently there are 3 sets of

##########
File path: rfcs/0001-meta-schedule-autotensorir.md
##########
@@ -64,8 +76,17 @@ primitives form a domain-specific language (DSL) describing the transformation o
 
 ## 3. Guide-level explanation
 
-In this section, we describe the syntax of meta schedule DSL, and how it could be used to describe
-and auto-generate the design space.
+Meta Schedule DSL is a flexible way to define or auto-generate the design space.

Review comment:
       might say: Meta Schedule DSL is a language that provides TVM backend integrators a flexible way to define or auto-generate the operator design space.

##########
File path: rfcs/0001-meta-schedule-autotensorir.md
##########
@@ -22,36 +22,48 @@
 
 ## 1. Summary
 
-This proposal introduces Meta Schedule: a probabilistic scheduling DSL on TIR that unifies the
+This proposal introduces Meta Schedule: a scheduling DSL on TIR that unifies the
 approaches of AutoTVM and Auto Scheduler (Ansor). Meta schedule provides a pragmatic way to define
 the space of automatic tuning, extensibility in terms of all possible TIR schedule primitives like
 tensorization and loop partitioning, and customizability on every layer of the automation system.
 
-Meta Schedule is our 3rd generation automatic scheduling system.
+Meta Schedule is the 3rd generation automatic scheduling system.
 
 ## 2. Motivation
 
 **Scheduling and design space.** In TVM TensorIR, optimization of a TensorIR program is done via a
-sequence of transformations. For example, we reorder loops for better locality and we tensorize for
+sequence of transformations. For example, reordering loops for better locality and tensorizing for
 specific hardware intrinsics. The process of invoking such a set of pre-defined transformations is
 called "**scheduling**", and each transformation is called a "**schedule primitive**". These
 primitives form a domain-specific language (DSL) describing the transformation of TensorIR programs.
 **Design space** is the set of all possible schedulings with respect to a TensorIR program.
 
-**Problems with the current scheduling system.** Currently we have 3 sets of scheduling APIs:
+### Problems with the current scheduling system
+
+Currently there are have 3 sets of scheduling APIs in TVM:
 * **Manual schedule**: Developers optimize their programs by manually invoking schedule primitives,
   i.e. explore points in the design space with humans in the loop. This can be a tedious and
   error-prone approach, hence the creation of AutoTVM and AutoScheduler (Ansor).
-* **AutoTVM**: The automation system requires users to define "schedule templates" as the design
-  space for each operator. Therefore, it is inextensible to hundreds of operators and variety
+* **AutoTVM**: The automation system requires users to define the design space through
+  per-operator "schedule templates." Therefore, programmer time is a bottleneck in scaling
+  to hundreds of operators across many hardware platforms.
   hardware platforms.
 * **AutoScheduler (Ansor)**: It automatically generates schedule templates as the design space,
   according to a set of predefined "search rules". However, it is non-trivial to extend
   AutoScheduler to new schedule primitives (tensorize, loop partition, software pipelining, etc).
 * The three systems above have isolated sets of APIs with several layers of their own abstraction,
   which are not only hard to learn, but also engineering-intensive to customize.
 
-**Benefits of Meta Schedule.**  Meta schedule provides:
+### Benefits of Meta Schedule
+
+The existing three scheduling systems are mutually incompatible with each other in terms of API
+design and divergence: besides manual TE scheduling, AutoTVM requires users to learn a new set of
+APIs, and AutoScheduler brings in another set of C++-based search rules. It adds the users' mental
+overhead to understand and extend the existing systems. Further, the inability to switch between
+template-based and template-free auto-tuning could lead to inferior customizability and hence worse

Review comment:
       or even: and hence make it needlessly difficult to achieve optimal performance.

##########
File path: rfcs/0001-meta-schedule-autotensorir.md
##########
@@ -64,8 +76,17 @@ primitives form a domain-specific language (DSL) describing the transformation o
 
 ## 3. Guide-level explanation
 
-In this section, we describe the syntax of meta schedule DSL, and how it could be used to describe
-and auto-generate the design space.
+Meta Schedule DSL is a flexible way to define or auto-generate the design space.
+
+This section introduces its syntax of meta schedule DSL and usage in terms of describing and
+auto-generating the design space, more specifically, its APIs for:
+1) Manually constructing a schedule using existing schedule primitives (Section 3.1);
+2) Defining composite schedule to simplify the ap sequence of schedule primitives (Section 3.2);
+3) Describing a design space of possible schedules,
+a.k.a. AutoTVM-style schedule templates (Section 3.3);
+4) Automatically generating the design space, a.k.a. Ansor-style search rules (Section 3.4);

Review comment:
       I would stick to one name here--so suggest replacing Ansor with AutoScheduler even though Ansor originated the idea. You can explain elsewhere that AutoScheduler was derived from Ansor.

##########
File path: rfcs/0001-meta-schedule-autotensorir.md
##########
@@ -64,8 +76,17 @@ primitives form a domain-specific language (DSL) describing the transformation o
 
 ## 3. Guide-level explanation
 
-In this section, we describe the syntax of meta schedule DSL, and how it could be used to describe
-and auto-generate the design space.
+Meta Schedule DSL is a flexible way to define or auto-generate the design space.
+
+This section introduces its syntax of meta schedule DSL and usage in terms of describing and
+auto-generating the design space, more specifically, its APIs for:

Review comment:
       ```suggestion
   This section introduces the syntax of Meta Schedule DSL by describing the 5 common usage patterns envisioned by this RFC. These patterns are:
   ```

##########
File path: rfcs/0001-meta-schedule-autotensorir.md
##########
@@ -22,36 +22,48 @@
 
 ## 1. Summary
 
-This proposal introduces Meta Schedule: a probabilistic scheduling DSL on TIR that unifies the
+This proposal introduces Meta Schedule: a scheduling DSL on TIR that unifies the
 approaches of AutoTVM and Auto Scheduler (Ansor). Meta schedule provides a pragmatic way to define
 the space of automatic tuning, extensibility in terms of all possible TIR schedule primitives like
 tensorization and loop partitioning, and customizability on every layer of the automation system.
 
-Meta Schedule is our 3rd generation automatic scheduling system.
+Meta Schedule is the 3rd generation automatic scheduling system.
 
 ## 2. Motivation
 
 **Scheduling and design space.** In TVM TensorIR, optimization of a TensorIR program is done via a
-sequence of transformations. For example, we reorder loops for better locality and we tensorize for
+sequence of transformations. For example, reordering loops for better locality and tensorizing for
 specific hardware intrinsics. The process of invoking such a set of pre-defined transformations is
 called "**scheduling**", and each transformation is called a "**schedule primitive**". These

Review comment:
       I would introduce this first as a general concept, because a key detail here is that all 3 scheduling APIs aim to achieve the same thing: translating a Relay subgraph into TIR subgraph or PrimFunc. It would also help to note that TVM's approach to workload-specific optimization is to represent such optimizations in TensorIR. 
   
   It's not true that generally, TensorIR is always optimized by a set of TIR transformations. Only with AutoScheduler is this true, correct? With AutoTVM, TensorIR is merely templated. That's a key benefit you could introduce later of AutoScheduler APIs.

##########
File path: rfcs/0001-meta-schedule-autotensorir.md
##########
@@ -107,29 +128,64 @@ best schedule according to measurement results on their device.
 As introduced in the previous section, in TensorIR, each schedule primitive handles only a very
 basic transformation of the IR. For example, `split` only splits a loop into two new loops. In the
 real world, the over-fine granularity of those primitives usually leads to repetitive and verbose
-scheduling code, as
-[mentioned](https://discuss.tvm.apache.org/t/rfc-tensorir-a-schedulable-ir-for-tvm/7872/43?u=junrushao1994)
-by developers in our community.
+scheduling code. Take the code snippet in the previous section as an example, a sequence of `split`s
+are invoked, followed by a `reorder`, and all these together are called "SSRSRS" tiling.

Review comment:
       yeah i like this better, tweaking it a bit:
   
   ```suggestion
   scheduling code. Take the code snippet in the previous section as an example: a sequence of `split`s
   are invoked, followed by a `reorder`. Taken together these 4 primitives are colloquially known as  "SSRSRS" tiling.
   ```

##########
File path: rfcs/0001-meta-schedule-autotensorir.md
##########
@@ -107,29 +128,64 @@ best schedule according to measurement results on their device.
 As introduced in the previous section, in TensorIR, each schedule primitive handles only a very
 basic transformation of the IR. For example, `split` only splits a loop into two new loops. In the
 real world, the over-fine granularity of those primitives usually leads to repetitive and verbose
-scheduling code, as
-[mentioned](https://discuss.tvm.apache.org/t/rfc-tensorir-a-schedulable-ir-for-tvm/7872/43?u=junrushao1994)
-by developers in our community.
+scheduling code. Take the code snippet in the previous section as an example, a sequence of `split`s
+are invoked, followed by a `reorder`, and all these together are called "SSRSRS" tiling.
+
+To make it more convenient and modular, users are allowed to register **composite schedules** that apply
+a sequence of schedule primitives according to certain analysis of the IR. The word *composite* here
+is used against the word *primitive*, which means it is a transformation *composed* of those
+*primitives*.
+
+For example, suppose there is a composite schedule called `Inline-All-Elementwise-Operations`, which

Review comment:
       could you explain the parameters of `Inline-All-Elementwise-Operations`, since you use it below in code? Also, can you list at a high level the schedule primitives that are composed here? I think that's the unclear bit here and the rest is good. 

##########
File path: rfcs/0001-meta-schedule-autotensorir.md
##########
@@ -107,29 +128,64 @@ best schedule according to measurement results on their device.
 As introduced in the previous section, in TensorIR, each schedule primitive handles only a very
 basic transformation of the IR. For example, `split` only splits a loop into two new loops. In the
 real world, the over-fine granularity of those primitives usually leads to repetitive and verbose
-scheduling code, as
-[mentioned](https://discuss.tvm.apache.org/t/rfc-tensorir-a-schedulable-ir-for-tvm/7872/43?u=junrushao1994)
-by developers in our community.
+scheduling code. Take the code snippet in the previous section as an example, a sequence of `split`s
+are invoked, followed by a `reorder`, and all these together are called "SSRSRS" tiling.
+
+To make it more convenient and modular, users are allowed to register **composite schedules** that apply
+a sequence of schedule primitives according to certain analysis of the IR. The word *composite* here
+is used against the word *primitive*, which means it is a transformation *composed* of those

Review comment:
       follow-up from https://github.com/apache/tvm-rfcs/pull/5#discussion_r666310349 (GH won't allow me to continue the thread in review)
   
   in the previous sentence I think it's clear you're using *composite* and *primitive* together, so i might phrase this more like:
   "The word **composite** here means the schedule transformation is *composed* of those **primitives**"
   
   (use different emphases for "composed" than you use for "composite" and "primitives" since the latter two are definitions and the former is emphasis)

##########
File path: rfcs/0001-meta-schedule-autotensorir.md
##########
@@ -147,17 +203,20 @@ sch.reorder(
 
 ### 3.4. AutoScheduler-style Design Space Generation
 
-AutoScheduler (Ansor) generates schedule templates by applying their SearchRules to each stage.
-SearchRule analyzes TE and eagerly trigger schedule primitives accordingly in its internally
-maintained mini IR.
+To generate design space, AutoScheduler (Ansor) applies a set of rules to each TE stage.

Review comment:
       could you explain what a te stage is?

##########
File path: rfcs/0001-meta-schedule-autotensorir.md
##########
@@ -147,17 +203,20 @@ sch.reorder(
 
 ### 3.4. AutoScheduler-style Design Space Generation
 
-AutoScheduler (Ansor) generates schedule templates by applying their SearchRules to each stage.
-SearchRule analyzes TE and eagerly trigger schedule primitives accordingly in its internally
-maintained mini IR.
+To generate design space, AutoScheduler (Ansor) applies a set of rules to each TE stage.
+The rules analyze the TE operations and apply an internal DSL to manipulating its internal IR,
+which is in the end mapped to TE schedule primitives. This process is called *sketch generation*.
 
-As introduced in Section 3.2, composite schedule rules are equivalent to AutoScheduler's SearchRule
-in TensorIR scheduling. To further generate a design space for scheduling, sampling instructions are
-used in composite schedule rules. Similarly, the sketch generation phase in AutoScheduler is
-equivalent to applying composite schedule rules to each block in TensorIR.
+Composite schedule rules work in a similar way scheduling TensorIR, as introduced in Section 3.2.
+It analyzes the TensorIR and apply schedule primitives directly to TensorIR accordingly.
+When applying such rules to each TensorIR block in certain order (e.g. Post-DFS Order),
+it generates a sequence of schedule primitives.
+If the sampling instructions are present in this sequence, 

Review comment:
       nit: delete "the"

##########
File path: rfcs/0001-meta-schedule-autotensorir.md
##########
@@ -147,17 +203,20 @@ sch.reorder(
 
 ### 3.4. AutoScheduler-style Design Space Generation
 
-AutoScheduler (Ansor) generates schedule templates by applying their SearchRules to each stage.
-SearchRule analyzes TE and eagerly trigger schedule primitives accordingly in its internally
-maintained mini IR.
+To generate design space, AutoScheduler (Ansor) applies a set of rules to each TE stage.
+The rules analyze the TE operations and apply an internal DSL to manipulating its internal IR,
+which is in the end mapped to TE schedule primitives. This process is called *sketch generation*.
 
-As introduced in Section 3.2, composite schedule rules are equivalent to AutoScheduler's SearchRule
-in TensorIR scheduling. To further generate a design space for scheduling, sampling instructions are
-used in composite schedule rules. Similarly, the sketch generation phase in AutoScheduler is
-equivalent to applying composite schedule rules to each block in TensorIR.
+Composite schedule rules work in a similar way scheduling TensorIR, as introduced in Section 3.2.
+It analyzes the TensorIR and apply schedule primitives directly to TensorIR accordingly.
+When applying such rules to each TensorIR block in certain order (e.g. Post-DFS Order),

Review comment:
       When would it be applied in any other order? If never, could you just state: "A composite schedule rule inspects a given TensorIR fragment and applies a sequence of schedule primitives to transform the TensorIR. Composite schedule rules are always applied in post-DFS order"

##########
File path: rfcs/0001-meta-schedule-autotensorir.md
##########
@@ -147,17 +203,20 @@ sch.reorder(
 
 ### 3.4. AutoScheduler-style Design Space Generation
 
-AutoScheduler (Ansor) generates schedule templates by applying their SearchRules to each stage.
-SearchRule analyzes TE and eagerly trigger schedule primitives accordingly in its internally
-maintained mini IR.
+To generate design space, AutoScheduler (Ansor) applies a set of rules to each TE stage.
+The rules analyze the TE operations and apply an internal DSL to manipulating its internal IR,
+which is in the end mapped to TE schedule primitives. This process is called *sketch generation*.
 
-As introduced in Section 3.2, composite schedule rules are equivalent to AutoScheduler's SearchRule
-in TensorIR scheduling. To further generate a design space for scheduling, sampling instructions are
-used in composite schedule rules. Similarly, the sketch generation phase in AutoScheduler is
-equivalent to applying composite schedule rules to each block in TensorIR.
+Composite schedule rules work in a similar way scheduling TensorIR, as introduced in Section 3.2.
+It analyzes the TensorIR and apply schedule primitives directly to TensorIR accordingly.
+When applying such rules to each TensorIR block in certain order (e.g. Post-DFS Order),
+it generates a sequence of schedule primitives.
+If the sampling instructions are present in this sequence, 
+the support of the probability space form a design space of possible schedulings.

Review comment:
       AutoScheduler further explores the design space defined by those sampling instructions.

##########
File path: rfcs/0001-meta-schedule-autotensorir.md
##########
@@ -266,55 +322,83 @@ sch.reorder(l14, l18, l15, l19, l22, l16, l20, l23, l17, l21)
 ### 4.2. Exploring the Design Space
 
 Meta Schedule provides several built-in exploration strategies to exhaustively or efficiently search
-for efficient schedules.
-
-**Random search by replaying schedule functions.** With a user-provided schedule function
-as a black-box design space generator, our system could repetitively invoke such an opaque function
-without doing any extra analysis. The function could be written in C++ or Python, or any language
-that implements packed function FFI. If sampling instructions are present in the function, then each
-invocation results in a different IRModule after being scheduled because the random decisions are
-possibly changed across different runs. Effectively, it is equivalent to
-random exploration without trace, allowing the flexibility for users to define arbitrary functions
+for efficient schedules. Those strategies are mostly supported by re-execute either a function or a
+trace with a builtin interpreter in meta schedule, and this process is called **replay**.
+
+#### Random search by replaying schedule functions
+
+With a user-provided schedule function
+as a black-box design space generator, meta schedule could repetitively invokes such an opaque TVM

Review comment:
       remove "could" here--state exactly what happens

##########
File path: rfcs/0001-meta-schedule-autotensorir.md
##########
@@ -266,55 +322,83 @@ sch.reorder(l14, l18, l15, l19, l22, l16, l20, l23, l17, l21)
 ### 4.2. Exploring the Design Space
 
 Meta Schedule provides several built-in exploration strategies to exhaustively or efficiently search
-for efficient schedules.
-
-**Random search by replaying schedule functions.** With a user-provided schedule function
-as a black-box design space generator, our system could repetitively invoke such an opaque function
-without doing any extra analysis. The function could be written in C++ or Python, or any language
-that implements packed function FFI. If sampling instructions are present in the function, then each
-invocation results in a different IRModule after being scheduled because the random decisions are
-possibly changed across different runs. Effectively, it is equivalent to
-random exploration without trace, allowing the flexibility for users to define arbitrary functions
+for efficient schedules. Those strategies are mostly supported by re-execute either a function or a
+trace with a builtin interpreter in meta schedule, and this process is called **replay**.
+
+#### Random search by replaying schedule functions
+
+With a user-provided schedule function
+as a black-box design space generator, meta schedule could repetitively invokes such an opaque TVM
+packed function without doing any extra analysis.
+If sampling instructions are present in the trace, then scheduling is non-deterministic
+(random decisions may not be repeated across runs)

Review comment:
       follow-up from https://github.com/apache/tvm-rfcs/pull/5#discussion_r668395363
   
   this seems bad if you want to reproduce. is there a way to do that by supplying the trace, rather than manually passing in the decisions as a list?

##########
File path: rfcs/0001-meta-schedule-autotensorir.md
##########
@@ -147,17 +203,20 @@ sch.reorder(
 
 ### 3.4. AutoScheduler-style Design Space Generation
 
-AutoScheduler (Ansor) generates schedule templates by applying their SearchRules to each stage.
-SearchRule analyzes TE and eagerly trigger schedule primitives accordingly in its internally
-maintained mini IR.
+To generate design space, AutoScheduler (Ansor) applies a set of rules to each TE stage.
+The rules analyze the TE operations and apply an internal DSL to manipulating its internal IR,

Review comment:
       is the DSL also internal or just the IR?

##########
File path: rfcs/0001-meta-schedule-autotensorir.md
##########
@@ -107,29 +128,64 @@ best schedule according to measurement results on their device.
 As introduced in the previous section, in TensorIR, each schedule primitive handles only a very
 basic transformation of the IR. For example, `split` only splits a loop into two new loops. In the
 real world, the over-fine granularity of those primitives usually leads to repetitive and verbose
-scheduling code, as
-[mentioned](https://discuss.tvm.apache.org/t/rfc-tensorir-a-schedulable-ir-for-tvm/7872/43?u=junrushao1994)
-by developers in our community.
+scheduling code. Take the code snippet in the previous section as an example, a sequence of `split`s
+are invoked, followed by a `reorder`, and all these together are called "SSRSRS" tiling.
+
+To make it more convenient and modular, users are allowed to register **composite schedules** that apply
+a sequence of schedule primitives according to certain analysis of the IR. The word *composite* here
+is used against the word *primitive*, which means it is a transformation *composed* of those
+*primitives*.
+
+For example, suppose there is a composite schedule called `Inline-All-Elementwise-Operations`, which
+inlines all the elementwise computation into their consumers. Applying it to the following TensorIR:
+
+```python
+@tvm.script.tir
+def example_func(...):
+  for i, j in ...:
+    with tir.Block("B") ...:
+      B[i, j] = A[i, j] + 1
+  for i, j in ...:
+    with tir.Block("C") ...:
+      C[i, j] = B[i, j] + 1
+  for i, j in ...:
+    with tir.Block("D") ...:
+      D[i, j] = C[i, j] + 1
+
+sch = tir.Schedule(example_func)
+InlineAllElementwiseOperations().apply(sch, sch.get_block("D"))
+print(tvm.script.asscript(sch.mod))
+```
+
+The result after applying the composite schedule is:
 
-To make it more convenient and modular, we allow users to register "composite schedules" that apply
-a sequence of schedule primitives according to certain analysis of the IR. For instance, a composite
-schedule may inspect a TensorIR block and decide whether we should call `compute_inline` on it.
+```python
+@tvm.script.tir
+def example_func(...):
+  for i, j in ...:
+    with tir.Block("D") ...:
+      D[i, j] = A[i, j] + 1 + 1 + 1
+```
 
 ### 3.3. AutoTVM-style Design Space Description
 
-Meta schedule extends the schedule DSL with sampling instructions. When included in a schedule,
-these instructions parametrize the schedule from a single deterministic point to a space supported
-by random variables (tile size, etc.), making it possible for developers to describe the design
-space with meta schedule APIs.
+Meta schedule extends the schedule DSL with a set of new schedule primitives with randomness,
+called **sampling instructions**. These primitives do not transform the TensorIR,
+but instead will generate random decisions from specific distributions in each run,

Review comment:
       i started with a clarifying suggestion but wound up with a question. i think in general the technique is to describe a finite space and then select an element using statistical distributions, but would be great to clarify.
   
   ```suggestion
   Meta schedule extends the schedule DSL with a set of new schedule primitives 
   called **sampling instructions**. These primitives do not transform the TensorIR,
   but instead introduce random statistical variables which can be referenced later in scheduling 
   to parameterize the schedule. Incorporating **sampling instructions** into a operator's schedule
   allows the backend integrator to succinctly describe a design space in terms of <explain how statistical distributions can be used to parameterize an integral quantity like tile size here>.
   ```

##########
File path: rfcs/0001-meta-schedule-autotensorir.md
##########
@@ -211,7 +267,7 @@ explore the design space. The figure below briefly illustrates the workflow of t
 
 **Trace**. To represent the design space defined by the meta schedule DSL, the underlying system
 records all the instructions users applied to the schedule class, including sampling and schedule
-primitives. We call this list of instructions a trace.
+primitives. This list of instructions a trace is called a trace.

Review comment:
       clarify

##########
File path: rfcs/0001-meta-schedule-autotensorir.md
##########
@@ -266,55 +322,83 @@ sch.reorder(l14, l18, l15, l19, l22, l16, l20, l23, l17, l21)
 ### 4.2. Exploring the Design Space
 
 Meta Schedule provides several built-in exploration strategies to exhaustively or efficiently search
-for efficient schedules.
-
-**Random search by replaying schedule functions.** With a user-provided schedule function
-as a black-box design space generator, our system could repetitively invoke such an opaque function
-without doing any extra analysis. The function could be written in C++ or Python, or any language
-that implements packed function FFI. If sampling instructions are present in the function, then each
-invocation results in a different IRModule after being scheduled because the random decisions are
-possibly changed across different runs. Effectively, it is equivalent to
-random exploration without trace, allowing the flexibility for users to define arbitrary functions
+for efficient schedules. Those strategies are mostly supported by re-execute either a function or a
+trace with a builtin interpreter in meta schedule, and this process is called **replay**.
+
+#### Random search by replaying schedule functions
+
+With a user-provided schedule function
+as a black-box design space generator, meta schedule could repetitively invokes such an opaque TVM
+packed function without doing any extra analysis.
+If sampling instructions are present in the trace, then scheduling is non-deterministic
+(random decisions may not be repeated across runs)
+Effectively, it is equivalent to random exploration without trace,
+allowing the flexibility for users to define arbitrary functions
 that trace may not well support (e.g. control flow divergence based on the value of intermediate
 random variables), but it forbids future opportunity of any trace-based analysis.
 
-**Random search by replaying traces.** Traces are obtained from a design space generator, and
-replayed with a builtin interpreter in our system. If sampling instructions are present on the
-traces, then their random decisions are mutated during each replay, i.e. jumps to a new point in the
+#### Random search by replaying traces
+
+A builtin interpreter directly replays the traces obtained
+from manual schedule, template-based or template-free design space generation.
+If sampling instructions are present on the traces,
+then their random decisions are mutated during each replay, i.e. jumps to a new point in the
 design space. Therefore, repetitive replay of those traces are equivalent to exploration of the
-design space. Our system could potentially benefit from trace-based analysis, including rejecting
-obviously invalid schedules (e.g. using too much CUDA resources), doing dead-code elimination to
-simplify a trace, extracting trace-based features used in the cost model, etc.
+design space. meta schedule could potentially benefit from trace-based analysis, making the search more
+efficient, including rejecting obviously invalid schedules (e.g. using too much CUDA resources),
+doing dead-code elimination to simplify a trace, extracting trace-based features used in the cost
+model, etc.
 
-**Cost-model-guided evolutionary search**. A more efficient exploration strategy. We define two sets
-of rules:
+#### Cost-model-guided evolutionary search
+
+A more efficient exploration strategy, introduced in the Ansor.

Review comment:
       which section is "the Ansor"?

##########
File path: rfcs/0001-meta-schedule-autotensorir.md
##########
@@ -266,55 +322,83 @@ sch.reorder(l14, l18, l15, l19, l22, l16, l20, l23, l17, l21)
 ### 4.2. Exploring the Design Space
 
 Meta Schedule provides several built-in exploration strategies to exhaustively or efficiently search
-for efficient schedules.
-
-**Random search by replaying schedule functions.** With a user-provided schedule function
-as a black-box design space generator, our system could repetitively invoke such an opaque function
-without doing any extra analysis. The function could be written in C++ or Python, or any language
-that implements packed function FFI. If sampling instructions are present in the function, then each
-invocation results in a different IRModule after being scheduled because the random decisions are
-possibly changed across different runs. Effectively, it is equivalent to
-random exploration without trace, allowing the flexibility for users to define arbitrary functions
+for efficient schedules. Those strategies are mostly supported by re-execute either a function or a
+trace with a builtin interpreter in meta schedule, and this process is called **replay**.
+
+#### Random search by replaying schedule functions
+
+With a user-provided schedule function
+as a black-box design space generator, meta schedule could repetitively invokes such an opaque TVM
+packed function without doing any extra analysis.
+If sampling instructions are present in the trace, then scheduling is non-deterministic
+(random decisions may not be repeated across runs)
+Effectively, it is equivalent to random exploration without trace,
+allowing the flexibility for users to define arbitrary functions
 that trace may not well support (e.g. control flow divergence based on the value of intermediate
 random variables), but it forbids future opportunity of any trace-based analysis.
 
-**Random search by replaying traces.** Traces are obtained from a design space generator, and
-replayed with a builtin interpreter in our system. If sampling instructions are present on the
-traces, then their random decisions are mutated during each replay, i.e. jumps to a new point in the
+#### Random search by replaying traces
+
+A builtin interpreter directly replays the traces obtained
+from manual schedule, template-based or template-free design space generation.
+If sampling instructions are present on the traces,
+then their random decisions are mutated during each replay, i.e. jumps to a new point in the
 design space. Therefore, repetitive replay of those traces are equivalent to exploration of the
-design space. Our system could potentially benefit from trace-based analysis, including rejecting
-obviously invalid schedules (e.g. using too much CUDA resources), doing dead-code elimination to
-simplify a trace, extracting trace-based features used in the cost model, etc.
+design space. meta schedule could potentially benefit from trace-based analysis, making the search more

Review comment:
       rather than saying "trace-based analysis," maybe a better way to phrase is "The Meta Schedule search rate could be improved by allowing traces to be analyzed before they are run. For example, trace analysis could reject obviously-invalid schedules (e.g. using too many CUDA resource), ... before they are run."




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tvm-rfcs] junrushao1994 commented on a change in pull request #5: [RFC] Meta Schedule (AutoTensorIR)

Posted by GitBox <gi...@apache.org>.
junrushao1994 commented on a change in pull request #5:
URL: https://github.com/apache/tvm-rfcs/pull/5#discussion_r647846422



##########
File path: rfcs/0001-meta-schedule-autotensorir.md
##########
@@ -0,0 +1,297 @@
+* Feature Name: Meta Schedule (AutoTensorIR)
+* Start Date: 2021-05-28
+* RFC PR: TBD (apache/tvm-rfcs#0000)
+* GitHub Issue: TBD (apache/tvm-rfcs#0000)
+
+## 1. Summary
+
+This proposal introduces Meta Schedule: a probabilistic scheduling DSL on TIR that unifies the approaches of AutoTVM and Auto Scheduler (Ansor). Meta schedule provides a pragmatic way to define the space of automatic tuning, extensibility in terms of all possible TIR schedule primitives like tensorization and loop partitioning, and customizability on every layer of the automation system.
+
+Meta Schedule is our 3rd generation automatic scheduling system.
+
+## 2. Motivation
+
+**Scheduling and Design Space**
+
+In TVM TensorIR, optimization of a TensorIR program is done via a sequence of transformations. For example, we reorder loops for better locality and we tensorize for specific hardware intrinsics. The process of invoking such a set of pre-defined transformations is called “**scheduling**”, and each transformation is called a “**schedule primitive**”. These primitives form a domain-specific language (DSL) describing the transformation of TensorIR programs. **Design space** is the set of all possible schedulings with respect to a TensorIR program.
+
+**Problems with the Current Scheduling System**
+
+* **Manual schedule**: Developers optimize their programs by manually invoking schedule primitives, i.e. explore points in the design space with humans in the loop. This can be a tedious and error-prone approach, hence the creation of AutoTVM and AutoScheduler (Ansor).
+* **AutoTVM**: The automation system requires users to define “schedule templates” as the design space for each operator. Therefore, it is inextensible to hundreds of operators.
+* **AutoScheduler (Ansor)**: It automatically generates schedule templates as the design space, according to a set of predefined “search rules”. However, it is non-trivial to extend AutoScheduler to new schedule primitives (tensorize, loop partition, software pipelining).
+* The three systems above have isolated sets of APIs with several layers of their own abstraction, which are not only hard to learn, but also engineering-intensive to customize.
+
+**Benefit of Meta Schedule**
+
+* Succinct syntax, consistent APIs to TensorIR schedule with no other layer of abstraction.
+* Provides unified APIs for implementing manual schedule, AutoTVM and AutoScheduler (Ansor).
+* Extensibility to all the schedule primitives, including tensorization and loop partitioning. Almost no extra effort is needed to use a new primitive in auto-tuning.
+* The automation infrastructure is customizable across every layer.
+
+## 3. Guide-level explanation
+
+In this section, we describe the syntax of meta schedule DSL, and how it could be used to describe and auto-generate the design space.
+
+### 3.1. Manual Schedule
+
+Meta schedule APIs are almost the same as TE or TensorIR scheduling. Here is an example of a manual schedule for matrix multiplication:
+
+```python
+# Designate a set of tile sizes
+i_tiles = [16, 8, 8, 8]
+j_tiles = [16, 8, 8, 8]
+k_tiles = [256, 8]
+
+# Tile the loops according to the tile sizes
+i_0, i_1, i_2, i_3 = sch.split(loop=i, factors=i_tiles)
+j_0, j_1, j_2, j_3 = sch.split(loop=j, factors=j_tiles)
+k_0, k_1           = sch.split(loop=k, factors=k_tiles)
+
+# Organize the loops into “SSRSRS” 6-level tiles
+sch.reorder(
+    i_0, j_0, # S
+    i_1, j_1, # S
+    k_0,      # R
+    i_2, j_2, # S
+    k_1,      # R
+    i_3, j_3, # S
+)
+```
+
+In this example, the developers may tweak the tile sizes and measure the performance of the generated kernels to explore the opportunities of potential optimization.
+
+Generally speaking, while writing a schedule, there are often some parameters that are hard to determine ahead of time, for example, tile sizes, unroll steps, or which tensor intrinsics to use. Developers may manually enumerate possible combinations of these unknown factors, and then pick the best schedule according to measurement results on their device.
+
+### 3.2. AutoTVM-style Design Space Description
+
+Meta schedule extends the schedule DSL with sampling instructions. When included in a schedule, these instructions parametrize the schedule from a single deterministic point to a space supported by random variables (tile size, etc.), making it possible for developers to describe the design space with meta schedule APIs.
+
+We can extend the matmul example above to cover all possible tilings using these sampling instructions:
+
+```python
+# Sample tile sizes
+i_tiles = sch.sample_perfect_tile(i, n=4)

Review comment:
       Yeah we can add such a primitive. The set of primitives can always be extended.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tvm-rfcs] junrushao1994 commented on a change in pull request #5: [RFC] Meta Schedule (AutoTIR)

Posted by GitBox <gi...@apache.org>.
junrushao1994 commented on a change in pull request #5:
URL: https://github.com/apache/tvm-rfcs/pull/5#discussion_r661759570



##########
File path: rfcs/0001-meta-schedule-autotensorir.md
##########
@@ -0,0 +1,297 @@
+* Feature Name: Meta Schedule (AutoTensorIR)
+* Start Date: 2021-05-28
+* RFC PR: TBD (apache/tvm-rfcs#0000)
+* GitHub Issue: TBD (apache/tvm-rfcs#0000)
+
+## 1. Summary
+
+This proposal introduces Meta Schedule: a probabilistic scheduling DSL on TIR that unifies the approaches of AutoTVM and Auto Scheduler (Ansor). Meta schedule provides a pragmatic way to define the space of automatic tuning, extensibility in terms of all possible TIR schedule primitives like tensorization and loop partitioning, and customizability on every layer of the automation system.
+
+Meta Schedule is our 3rd generation automatic scheduling system.
+
+## 2. Motivation
+
+**Scheduling and Design Space**
+
+In TVM TensorIR, optimization of a TensorIR program is done via a sequence of transformations. For example, we reorder loops for better locality and we tensorize for specific hardware intrinsics. The process of invoking such a set of pre-defined transformations is called “**scheduling**”, and each transformation is called a “**schedule primitive**”. These primitives form a domain-specific language (DSL) describing the transformation of TensorIR programs. **Design space** is the set of all possible schedulings with respect to a TensorIR program.
+
+**Problems with the Current Scheduling System**
+
+* **Manual schedule**: Developers optimize their programs by manually invoking schedule primitives, i.e. explore points in the design space with humans in the loop. This can be a tedious and error-prone approach, hence the creation of AutoTVM and AutoScheduler (Ansor).
+* **AutoTVM**: The automation system requires users to define “schedule templates” as the design space for each operator. Therefore, it is inextensible to hundreds of operators.
+* **AutoScheduler (Ansor)**: It automatically generates schedule templates as the design space, according to a set of predefined “search rules”. However, it is non-trivial to extend AutoScheduler to new schedule primitives (tensorize, loop partition, software pipelining).
+* The three systems above have isolated sets of APIs with several layers of their own abstraction, which are not only hard to learn, but also engineering-intensive to customize.
+
+**Benefit of Meta Schedule**
+
+* Succinct syntax, consistent APIs to TensorIR schedule with no other layer of abstraction.
+* Provides unified APIs for implementing manual schedule, AutoTVM and AutoScheduler (Ansor).
+* Extensibility to all the schedule primitives, including tensorization and loop partitioning. Almost no extra effort is needed to use a new primitive in auto-tuning.
+* The automation infrastructure is customizable across every layer.
+
+## 3. Guide-level explanation
+
+In this section, we describe the syntax of meta schedule DSL, and how it could be used to describe and auto-generate the design space.
+
+### 3.1. Manual Schedule
+
+Meta schedule APIs are almost the same as TE or TensorIR scheduling. Here is an example of a manual schedule for matrix multiplication:
+
+```python
+# Designate a set of tile sizes
+i_tiles = [16, 8, 8, 8]
+j_tiles = [16, 8, 8, 8]
+k_tiles = [256, 8]
+
+# Tile the loops according to the tile sizes
+i_0, i_1, i_2, i_3 = sch.split(loop=i, factors=i_tiles)
+j_0, j_1, j_2, j_3 = sch.split(loop=j, factors=j_tiles)
+k_0, k_1           = sch.split(loop=k, factors=k_tiles)
+
+# Organize the loops into “SSRSRS” 6-level tiles
+sch.reorder(
+    i_0, j_0, # S
+    i_1, j_1, # S
+    k_0,      # R
+    i_2, j_2, # S
+    k_1,      # R
+    i_3, j_3, # S
+)
+```
+
+In this example, the developers may tweak the tile sizes and measure the performance of the generated kernels to explore the opportunities of potential optimization.
+
+Generally speaking, while writing a schedule, there are often some parameters that are hard to determine ahead of time, for example, tile sizes, unroll steps, or which tensor intrinsics to use. Developers may manually enumerate possible combinations of these unknown factors, and then pick the best schedule according to measurement results on their device.
+
+### 3.2. AutoTVM-style Design Space Description
+
+Meta schedule extends the schedule DSL with sampling instructions. When included in a schedule, these instructions parametrize the schedule from a single deterministic point to a space supported by random variables (tile size, etc.), making it possible for developers to describe the design space with meta schedule APIs.
+
+We can extend the matmul example above to cover all possible tilings using these sampling instructions:
+
+```python
+# Sample tile sizes
+i_tiles = sch.sample_perfect_tile(i, n=4)
+j_tiles = sch.sample_perfect_tile(j, n=4)
+k_tiles = sch.sample_perfect_tile(k, n=2)
+# Tile the loops according to the random variables
+i_0, i_1, i_2, i_3 = sch.split(loop=i, factors=i_tiles)
+j_0, j_1, j_2, j_3 = sch.split(loop=j, factors=j_tiles)
+k_0, k_1           = sch.split(loop=k, factors=k_tiles)
+# Organize the loops into “SSRSRS” 6-level tiles
+sch.reorder(
+    i_0, j_0, # S
+    i_1, j_1, # S
+    k_0,      # R
+    i_2, j_2, # S
+    k_1,      # R
+    i_3, j_3, # S
+)
+```
+
+### 3.3. Composite Schedule
+
+Each schedule primitive handles only a very basic operation to transform the IR, for example, `split` only splits a loop into two. In the real world, the over-fine granularity of those primitives usually leads to repetitive and verbose scheduling code, as [mentioned](https://discuss.tvm.apache.org/t/rfc-tensorir-a-schedulable-ir-for-tvm/7872/43?u=junrushao1994) by developers in our community.
+
+To counter this challenge, we allow users to register “composite schedules” that analyze the IR, and apply a set of schedule primitives correspondingly. For instance, a composite schedule may inspect a TensorIR block and decide whether we should call `compute_inline` on it. The composite schedule may use sampling instructions to fill in undecided choices.
+
+Our system also ships with some built-in composite schedules, including:
+
+* Multi-level tiling
+* Inline pure spatial blocks
+* Parallelize & vectorize & unroll
+* Auto tensorize
+* …
+
+### 3.4. AutoScheduler-style Design Space Generation
+
+AutoScheduler (Ansor) generates schedule templates by applying their SearchRules to each stage. Meta schedule treats a search rule as a composite schedule, and applies each composite schedule to each block of TensorIR to generate the design space.
+
+### 3.5. Unifying manual schedule / AutoTVM / Ansor
+
+In this section, we show that the design space induced by TE manual schedule, AutoTVM and Ansor are all subsets of meta schedule, and meta schedule further allows mixing those three styles to search jointly.
+
+**Manual schedule**. The TE schedule is a special case of a meta schedule program, where there is no randomness introduced by sampling instructions. It is a single point in terms of design space.
+
+**AutoTVM (Template-based tuning)**. Writing one or more schedule functions in meta schedule, potentially with sampling instructions, is a natural representation of AutoTVM’s schedule templates (knobs). The PPL generates one or more traces as the design space to explore.
+
+**AutoScheduler (Ansor, Template-free tuning)**. As mentioned in the previous section, application of composite schedule rules generates the design space, which is equivalent to Ansor’s sketch generation.
+
+**Mixing styles in design space definition**. By taking union of the spaces induced by the three special cases, our system allows developers to combine generic rules that Ansor provides and operator-specific scheduling.
+
+## 4. Reference-level explanation
+
+In this section, we introduce the underlying techniques for the automation system to extract and explore the design space.
+
+### 4.1. Execution trace as the design space
+
+**Trace**. To represent the design space defined by the meta schedule DSL, the underlying system records all the instructions users applied to the schedule class, including sampling and schedule primitives. We call this list of instructions a trace.
+
+Executing the example above results in the following trace:
+
+```
+Instruction 0. Sample-Perfect-Tile
+Instruction 1. Sample-Perfect-Tile
+Instruction 2. Sample-Perfect-Tile
+Instruction 3. Split
+Instruction 4. Split
+Instruction 5. Split
+Instruction 6. Reorder
+```
+
+The trace is not directly user-facing, but a data structure inside the user-facing `Schedule` class that records the execution. The automation system extracts the trace and finds out the design space according to the sampling instructions.
+
+**Union of traces**. Often a single trace is unable to represent the entire space. Therefore, more precisely, our system works on a list of traces as the union of potential design space.
+
+**Fork a trace**. When two different decisions in the scheduling process are equally important to generate high-performance schedules, we allow forking the trace into two, and the design space is the union of the forked traces.
+
+### 4.2. Exploring the Search Space
+
+Meta Schedule provides several built-in exploration strategies to exhaustively or efficiently search for efficient schedules.
+
+**Program replay**. A simple strategy that replays the schedule program that generates the PPL, and doesn’t use any advantage provided by the PPL.
+
+**Random search**. Extracts the PPL, and repetitively re-executes the PPL by flipping coins purely randomly.
+
+**Cost-model-guided evolutionary search**. A more efficient exploration strategy. We define two sets of rules:
+
+* Mutator: defines how to jump to a point’s “neighbor” in the design space
+* Postprocessor: sometimes it is non-trivial to statically determine the PPL, for example:
+  * There is a hard requirement in CUDA that the maximum number of threads should not exceed 1024, but it is a random variable that cannot be determined before actually executing the PPL. In this case, we write a postprocessor that errors out when the condition is not satisfied.
+  * The number of outer loops to be fused together depends on their extents, which are random variables. In this case, we annotate the maximum extent allowed on the block, and do actual fusion in a postprocessor.
+
+Our evolutionary search algorithm uses mutators to find possible schedules in the design space, then applies postprocessors, and asks the cost model to predict its performance. After several iterations, the new schedules with the highest scores are finally compiled and measured on device. Epsilon-greedy is used in this process to balance exploitation and exploration.
+
+### 4.3. Python first for flexibility & customizability
+
+We implement the system in a way that all levels are decoupled and open to customization, aiming at providing a playground for developers to try out new ideas and potentially deliver performance quickly.
+
+While all the important APIs are implemented in C++ for efficiency, every part of the system can be easily switched to customized python implementation. For example,
+
+**Customize design space in python**. Can be a python function that does the schedule
+
+```python
+def schedule_matmul(sch) -> sch:
+    i, j, k = sch.get_loops(sch.get_block(“matmul”))
+    i_tiles = sch.sample_perfect_tile(i, n=4)
+    j_tiles = sch.sample_perfect_tile(j, n=4)
+    k_tiles = sch.sample_perfect_tile(k, n=2)
+    # Tile the loops according to the random variables
+    i_0, i_1, i_2, i_3 = sch.split(loop=i, factors=i_tiles)
+    j_0, j_1, j_2, j_3 = sch.split(loop=j, factors=j_tiles)
+    k_0, k_1 = sch.split(loop=k, factors=k_tiles)
+    # Organize the loops into “SSRSRS” 6-level tiles
+    sch.reorder(
+        i_0, j_0, # S
+        i_1, j_1, # S
+        k_0,      # R
+        i_2, j_2, # S
+        k_1,      # R
+        i_3, j_3, # S
+    )
+    return sch
+```
+
+**Customize composite schedule in python**. We provide two ways to define a composite schedule in python:
+
+Method 1. A simple decorator that converts a python function to a composite schedule
+
+```python
+@tir.as_composite_schedule(name="multi-level-tiling")
+def multi_level_tiling(sch: Schedule, block: BlockRV) -> Union[Schedule, List[Schedule]]:
+    ...
+```
+
+Method 2. Derive from `PyCompositeSchedule`, providing extra functionalities like initialization
+
+```python
+class MultiLevelTiling(PyCompositeSchedule):
+    def initialize(...):
+        ...
+    def apply(...):
+        ...
+```
+
+**Customize exploration strategies in python**. Developers can implement any search algorithm in python as well by deriving from `PySearchPolicy`.
+
+**Other customizable components**. This list includes:
+
+* Cost model
+* Database
+* Measure callbacks
+* Feature extractor
+* Program builder & runner
+* Analysis methods
+* ...
+
+In a short summary, almost every component of the system is decoupled with each other and extensions could be easily plugged in.
+
+### 4.4. Upstreaming Plan
+
+[M3a] Core infrastructure of the PPL
+* Instruction
+* Trace
+* Composite schedule
+* Sampler
+* Search policy
+* Design space generator
+
+[M3b] Host-side search infra
+* Database
+* Cost model
+* Measure callback
+
+[M3c] RPC-related search infra
+* Measure input, build result, measure result
+* Builder
+* Runner
+
+[M4a] Implementation of rules
+* Various built-in composite schedules
+* Various built-in mutators
+* Various built-in postprocessors
+* Automatic tensorization
+
+[M4b] Relay integration
+

Review comment:
       Great idea! @tqchen and I discussed many times on the documentation of not only TensorIR/meta schedule, but also the entire TVM project. We would love to renovate the documents more systematically :-)




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tvm-rfcs] tqchen commented on a change in pull request #5: [RFC] Meta Schedule (AutoTIR)

Posted by GitBox <gi...@apache.org>.
tqchen commented on a change in pull request #5:
URL: https://github.com/apache/tvm-rfcs/pull/5#discussion_r648514264



##########
File path: rfcs/0001-meta-schedule-autotensorir.md
##########
@@ -0,0 +1,297 @@
+* Feature Name: Meta Schedule (AutoTensorIR)
+* Start Date: 2021-05-28
+* RFC PR: TBD (apache/tvm-rfcs#0000)
+* GitHub Issue: TBD (apache/tvm-rfcs#0000)
+
+## 1. Summary
+
+This proposal introduces Meta Schedule: a probabilistic scheduling DSL on TIR that unifies the approaches of AutoTVM and Auto Scheduler (Ansor). Meta schedule provides a pragmatic way to define the space of automatic tuning, extensibility in terms of all possible TIR schedule primitives like tensorization and loop partitioning, and customizability on every layer of the automation system.
+
+Meta Schedule is our 3rd generation automatic scheduling system.
+
+## 2. Motivation
+
+**Scheduling and Design Space**
+
+In TVM TensorIR, optimization of a TensorIR program is done via a sequence of transformations. For example, we reorder loops for better locality and we tensorize for specific hardware intrinsics. The process of invoking such a set of pre-defined transformations is called “**scheduling**”, and each transformation is called a “**schedule primitive**”. These primitives form a domain-specific language (DSL) describing the transformation of TensorIR programs. **Design space** is the set of all possible schedulings with respect to a TensorIR program.
+
+**Problems with the Current Scheduling System**
+
+* **Manual schedule**: Developers optimize their programs by manually invoking schedule primitives, i.e. explore points in the design space with humans in the loop. This can be a tedious and error-prone approach, hence the creation of AutoTVM and AutoScheduler (Ansor).
+* **AutoTVM**: The automation system requires users to define “schedule templates” as the design space for each operator. Therefore, it is inextensible to hundreds of operators.
+* **AutoScheduler (Ansor)**: It automatically generates schedule templates as the design space, according to a set of predefined “search rules”. However, it is non-trivial to extend AutoScheduler to new schedule primitives (tensorize, loop partition, software pipelining).
+* The three systems above have isolated sets of APIs with several layers of their own abstraction, which are not only hard to learn, but also engineering-intensive to customize.
+
+**Benefit of Meta Schedule**
+
+* Succinct syntax, consistent APIs to TensorIR schedule with no other layer of abstraction.
+* Provides unified APIs for implementing manual schedule, AutoTVM and AutoScheduler (Ansor).
+* Extensibility to all the schedule primitives, including tensorization and loop partitioning. Almost no extra effort is needed to use a new primitive in auto-tuning.
+* The automation infrastructure is customizable across every layer.
+
+## 3. Guide-level explanation
+
+In this section, we describe the syntax of meta schedule DSL, and how it could be used to describe and auto-generate the design space.
+
+### 3.1. Manual Schedule
+
+Meta schedule APIs are almost the same as TE or TensorIR scheduling. Here is an example of a manual schedule for matrix multiplication:
+
+```python
+# Designate a set of tile sizes
+i_tiles = [16, 8, 8, 8]
+j_tiles = [16, 8, 8, 8]
+k_tiles = [256, 8]
+
+# Tile the loops according to the tile sizes
+i_0, i_1, i_2, i_3 = sch.split(loop=i, factors=i_tiles)
+j_0, j_1, j_2, j_3 = sch.split(loop=j, factors=j_tiles)
+k_0, k_1           = sch.split(loop=k, factors=k_tiles)
+
+# Organize the loops into “SSRSRS” 6-level tiles
+sch.reorder(
+    i_0, j_0, # S
+    i_1, j_1, # S
+    k_0,      # R
+    i_2, j_2, # S
+    k_1,      # R
+    i_3, j_3, # S
+)
+```
+
+In this example, the developers may tweak the tile sizes and measure the performance of the generated kernels to explore the opportunities of potential optimization.
+
+Generally speaking, while writing a schedule, there are often some parameters that are hard to determine ahead of time, for example, tile sizes, unroll steps, or which tensor intrinsics to use. Developers may manually enumerate possible combinations of these unknown factors, and then pick the best schedule according to measurement results on their device.
+
+### 3.2. AutoTVM-style Design Space Description
+
+Meta schedule extends the schedule DSL with sampling instructions. When included in a schedule, these instructions parametrize the schedule from a single deterministic point to a space supported by random variables (tile size, etc.), making it possible for developers to describe the design space with meta schedule APIs.
+
+We can extend the matmul example above to cover all possible tilings using these sampling instructions:
+
+```python
+# Sample tile sizes
+i_tiles = sch.sample_perfect_tile(i, n=4)
+j_tiles = sch.sample_perfect_tile(j, n=4)
+k_tiles = sch.sample_perfect_tile(k, n=2)
+# Tile the loops according to the random variables
+i_0, i_1, i_2, i_3 = sch.split(loop=i, factors=i_tiles)
+j_0, j_1, j_2, j_3 = sch.split(loop=j, factors=j_tiles)
+k_0, k_1           = sch.split(loop=k, factors=k_tiles)
+# Organize the loops into “SSRSRS” 6-level tiles
+sch.reorder(
+    i_0, j_0, # S
+    i_1, j_1, # S
+    k_0,      # R
+    i_2, j_2, # S
+    k_1,      # R
+    i_3, j_3, # S
+)
+```
+
+### 3.3. Composite Schedule
+
+Each schedule primitive handles only a very basic operation to transform the IR, for example, `split` only splits a loop into two. In the real world, the over-fine granularity of those primitives usually leads to repetitive and verbose scheduling code, as [mentioned](https://discuss.tvm.apache.org/t/rfc-tensorir-a-schedulable-ir-for-tvm/7872/43?u=junrushao1994) by developers in our community.
+
+To counter this challenge, we allow users to register “composite schedules” that analyze the IR, and apply a set of schedule primitives correspondingly. For instance, a composite schedule may inspect a TensorIR block and decide whether we should call `compute_inline` on it. The composite schedule may use sampling instructions to fill in undecided choices.
+
+Our system also ships with some built-in composite schedules, including:
+
+* Multi-level tiling
+* Inline pure spatial blocks
+* Parallelize & vectorize & unroll
+* Auto tensorize
+* …
+
+### 3.4. AutoScheduler-style Design Space Generation
+
+AutoScheduler (Ansor) generates schedule templates by applying their SearchRules to each stage. Meta schedule treats a search rule as a composite schedule, and applies each composite schedule to each block of TensorIR to generate the design space.
+
+### 3.5. Unifying manual schedule / AutoTVM / Ansor
+
+In this section, we show that the design space induced by TE manual schedule, AutoTVM and Ansor are all subsets of meta schedule, and meta schedule further allows mixing those three styles to search jointly.
+
+**Manual schedule**. The TE schedule is a special case of a meta schedule program, where there is no randomness introduced by sampling instructions. It is a single point in terms of design space.
+
+**AutoTVM (Template-based tuning)**. Writing one or more schedule functions in meta schedule, potentially with sampling instructions, is a natural representation of AutoTVM’s schedule templates (knobs). The PPL generates one or more traces as the design space to explore.
+
+**AutoScheduler (Ansor, Template-free tuning)**. As mentioned in the previous section, application of composite schedule rules generates the design space, which is equivalent to Ansor’s sketch generation.
+
+**Mixing styles in design space definition**. By taking union of the spaces induced by the three special cases, our system allows developers to combine generic rules that Ansor provides and operator-specific scheduling.
+
+## 4. Reference-level explanation

Review comment:
       The main goal is to make ensure that we have a good sense of what the architecture look like. Let us stick with the key data structures, For example the relation of items in M3a but not every tiny details of the classes. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tvm-rfcs] tqchen commented on a change in pull request #5: [RFC] Meta Schedule (AutoTIR)

Posted by GitBox <gi...@apache.org>.
tqchen commented on a change in pull request #5:
URL: https://github.com/apache/tvm-rfcs/pull/5#discussion_r648507080



##########
File path: rfcs/0001-meta-schedule-autotensorir.md
##########
@@ -0,0 +1,297 @@
+* Feature Name: Meta Schedule (AutoTensorIR)
+* Start Date: 2021-05-28
+* RFC PR: TBD (apache/tvm-rfcs#0000)
+* GitHub Issue: TBD (apache/tvm-rfcs#0000)
+
+## 1. Summary
+
+This proposal introduces Meta Schedule: a probabilistic scheduling DSL on TIR that unifies the approaches of AutoTVM and Auto Scheduler (Ansor). Meta schedule provides a pragmatic way to define the space of automatic tuning, extensibility in terms of all possible TIR schedule primitives like tensorization and loop partitioning, and customizability on every layer of the automation system.
+
+Meta Schedule is our 3rd generation automatic scheduling system.
+
+## 2. Motivation
+
+**Scheduling and Design Space**
+
+In TVM TensorIR, optimization of a TensorIR program is done via a sequence of transformations. For example, we reorder loops for better locality and we tensorize for specific hardware intrinsics. The process of invoking such a set of pre-defined transformations is called “**scheduling**”, and each transformation is called a “**schedule primitive**”. These primitives form a domain-specific language (DSL) describing the transformation of TensorIR programs. **Design space** is the set of all possible schedulings with respect to a TensorIR program.
+
+**Problems with the Current Scheduling System**
+
+* **Manual schedule**: Developers optimize their programs by manually invoking schedule primitives, i.e. explore points in the design space with humans in the loop. This can be a tedious and error-prone approach, hence the creation of AutoTVM and AutoScheduler (Ansor).
+* **AutoTVM**: The automation system requires users to define “schedule templates” as the design space for each operator. Therefore, it is inextensible to hundreds of operators.
+* **AutoScheduler (Ansor)**: It automatically generates schedule templates as the design space, according to a set of predefined “search rules”. However, it is non-trivial to extend AutoScheduler to new schedule primitives (tensorize, loop partition, software pipelining).
+* The three systems above have isolated sets of APIs with several layers of their own abstraction, which are not only hard to learn, but also engineering-intensive to customize.
+
+**Benefit of Meta Schedule**
+
+* Succinct syntax, consistent APIs to TensorIR schedule with no other layer of abstraction.
+* Provides unified APIs for implementing manual schedule, AutoTVM and AutoScheduler (Ansor).
+* Extensibility to all the schedule primitives, including tensorization and loop partitioning. Almost no extra effort is needed to use a new primitive in auto-tuning.
+* The automation infrastructure is customizable across every layer.
+
+## 3. Guide-level explanation
+
+In this section, we describe the syntax of meta schedule DSL, and how it could be used to describe and auto-generate the design space.
+
+### 3.1. Manual Schedule
+
+Meta schedule APIs are almost the same as TE or TensorIR scheduling. Here is an example of a manual schedule for matrix multiplication:
+
+```python
+# Designate a set of tile sizes
+i_tiles = [16, 8, 8, 8]
+j_tiles = [16, 8, 8, 8]
+k_tiles = [256, 8]
+
+# Tile the loops according to the tile sizes
+i_0, i_1, i_2, i_3 = sch.split(loop=i, factors=i_tiles)
+j_0, j_1, j_2, j_3 = sch.split(loop=j, factors=j_tiles)
+k_0, k_1           = sch.split(loop=k, factors=k_tiles)
+
+# Organize the loops into “SSRSRS” 6-level tiles
+sch.reorder(
+    i_0, j_0, # S
+    i_1, j_1, # S
+    k_0,      # R
+    i_2, j_2, # S
+    k_1,      # R
+    i_3, j_3, # S
+)
+```
+
+In this example, the developers may tweak the tile sizes and measure the performance of the generated kernels to explore the opportunities of potential optimization.
+
+Generally speaking, while writing a schedule, there are often some parameters that are hard to determine ahead of time, for example, tile sizes, unroll steps, or which tensor intrinsics to use. Developers may manually enumerate possible combinations of these unknown factors, and then pick the best schedule according to measurement results on their device.
+
+### 3.2. AutoTVM-style Design Space Description
+
+Meta schedule extends the schedule DSL with sampling instructions. When included in a schedule, these instructions parametrize the schedule from a single deterministic point to a space supported by random variables (tile size, etc.), making it possible for developers to describe the design space with meta schedule APIs.
+
+We can extend the matmul example above to cover all possible tilings using these sampling instructions:
+
+```python
+# Sample tile sizes
+i_tiles = sch.sample_perfect_tile(i, n=4)
+j_tiles = sch.sample_perfect_tile(j, n=4)
+k_tiles = sch.sample_perfect_tile(k, n=2)
+# Tile the loops according to the random variables
+i_0, i_1, i_2, i_3 = sch.split(loop=i, factors=i_tiles)
+j_0, j_1, j_2, j_3 = sch.split(loop=j, factors=j_tiles)
+k_0, k_1           = sch.split(loop=k, factors=k_tiles)
+# Organize the loops into “SSRSRS” 6-level tiles
+sch.reorder(
+    i_0, j_0, # S
+    i_1, j_1, # S
+    k_0,      # R
+    i_2, j_2, # S
+    k_1,      # R
+    i_3, j_3, # S
+)
+```
+
+### 3.3. Composite Schedule
+
+Each schedule primitive handles only a very basic operation to transform the IR, for example, `split` only splits a loop into two. In the real world, the over-fine granularity of those primitives usually leads to repetitive and verbose scheduling code, as [mentioned](https://discuss.tvm.apache.org/t/rfc-tensorir-a-schedulable-ir-for-tvm/7872/43?u=junrushao1994) by developers in our community.
+
+To counter this challenge, we allow users to register “composite schedules” that analyze the IR, and apply a set of schedule primitives correspondingly. For instance, a composite schedule may inspect a TensorIR block and decide whether we should call `compute_inline` on it. The composite schedule may use sampling instructions to fill in undecided choices.
+
+Our system also ships with some built-in composite schedules, including:
+
+* Multi-level tiling
+* Inline pure spatial blocks
+* Parallelize & vectorize & unroll
+* Auto tensorize
+* …
+
+### 3.4. AutoScheduler-style Design Space Generation
+
+AutoScheduler (Ansor) generates schedule templates by applying their SearchRules to each stage. Meta schedule treats a search rule as a composite schedule, and applies each composite schedule to each block of TensorIR to generate the design space.
+
+### 3.5. Unifying manual schedule / AutoTVM / Ansor
+
+In this section, we show that the design space induced by TE manual schedule, AutoTVM and Ansor are all subsets of meta schedule, and meta schedule further allows mixing those three styles to search jointly.
+
+**Manual schedule**. The TE schedule is a special case of a meta schedule program, where there is no randomness introduced by sampling instructions. It is a single point in terms of design space.
+
+**AutoTVM (Template-based tuning)**. Writing one or more schedule functions in meta schedule, potentially with sampling instructions, is a natural representation of AutoTVM’s schedule templates (knobs). The PPL generates one or more traces as the design space to explore.
+
+**AutoScheduler (Ansor, Template-free tuning)**. As mentioned in the previous section, application of composite schedule rules generates the design space, which is equivalent to Ansor’s sketch generation.

Review comment:
       I agree given it is mainly about the infrastructure.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tvm-rfcs] junrushao1994 commented on a change in pull request #5: [RFC] Meta Schedule (AutoTIR)

Posted by GitBox <gi...@apache.org>.
junrushao1994 commented on a change in pull request #5:
URL: https://github.com/apache/tvm-rfcs/pull/5#discussion_r668227109



##########
File path: rfcs/0001-meta-schedule-autotensorir.md
##########
@@ -0,0 +1,444 @@
+<!--- Licensed to the Apache Software Foundation (ASF) under one -->
+<!--- or more contributor license agreements.  See the NOTICE file -->
+<!--- distributed with this work for additional information -->
+<!--- regarding copyright ownership.  The ASF licenses this file -->
+<!--- to you under the Apache License, Version 2.0 (the -->
+<!--- "License"); you may not use this file except in compliance -->
+<!--- with the License.  You may obtain a copy of the License at -->
+
+<!---   http://www.apache.org/licenses/LICENSE-2.0 -->
+
+<!--- Unless required by applicable law or agreed to in writing, -->
+<!--- software distributed under the License is distributed on an -->
+<!--- "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY -->
+<!--- KIND, either express or implied.  See the License for the -->
+<!--- specific language governing permissions and limitations -->
+<!--- under the License. -->
+
+* Feature Name: Meta Schedule (Formerly AutoTIR)
+* Start Date: 2021-05-28
+* RFC PR: TBD (apache/tvm-rfcs#0000)
+* GitHub Issue: TBD (apache/tvm-rfcs#0000)
+
+## 1. Summary
+
+This proposal introduces Meta Schedule: a probabilistic scheduling DSL on TIR that unifies the
+approaches of AutoTVM and Auto Scheduler (Ansor). Meta schedule provides a pragmatic way to define
+the space of automatic tuning, extensibility in terms of all possible TIR schedule primitives like
+tensorization and loop partitioning, and customizability on every layer of the automation system.
+
+Meta Schedule is our 3rd generation automatic scheduling system.
+
+## 2. Motivation
+
+**Scheduling and design space.** In TVM TensorIR, optimization of a TensorIR program is done via a
+sequence of transformations. For example, we reorder loops for better locality and we tensorize for
+specific hardware intrinsics. The process of invoking such a set of pre-defined transformations is
+called "**scheduling**", and each transformation is called a "**schedule primitive**". These
+primitives form a domain-specific language (DSL) describing the transformation of TensorIR programs.
+**Design space** is the set of all possible schedulings with respect to a TensorIR program.
+
+**Problems with the current scheduling system.** Currently we have 3 sets of scheduling APIs:

Review comment:
       Yeah I was a bit uncertain if we need a subheading because it is a single paragraph when drafting this RFC, but it seems all of us prefer it to be subheading so will do. Thanks!




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tvm-rfcs] tqchen commented on pull request #5: [RFC] Meta Schedule (AutoTIR)

Posted by GitBox <gi...@apache.org>.
tqchen commented on pull request #5:
URL: https://github.com/apache/tvm-rfcs/pull/5#issuecomment-886048659


   Thanks everyone. It would be great to work to land this RFC :) @junrushao1994 please try your best attempt to address the latest batch of comments from @areusch and then i think we can merge this in 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tvm-rfcs] junrushao1994 commented on a change in pull request #5: [RFC] Meta Schedule (AutoTIR)

Posted by GitBox <gi...@apache.org>.
junrushao1994 commented on a change in pull request #5:
URL: https://github.com/apache/tvm-rfcs/pull/5#discussion_r676828227



##########
File path: rfcs/0001-meta-schedule-autotensorir.md
##########
@@ -147,17 +203,20 @@ sch.reorder(
 
 ### 3.4. AutoScheduler-style Design Space Generation
 
-AutoScheduler (Ansor) generates schedule templates by applying their SearchRules to each stage.
-SearchRule analyzes TE and eagerly trigger schedule primitives accordingly in its internally
-maintained mini IR.
+To generate design space, AutoScheduler (Ansor) applies a set of rules to each TE stage.

Review comment:
       > To generate design space, AutoScheduler applies a set of rules to each [TE stage](https://tvm.apache.org/docs/api/python/te.html#tvm.te.Stage) that corresponds to a [TE operation](https://tvm.apache.org/docs/api/doxygen/classtvm_1_1te_1_1Operation.html), defined by [`te.compute(...)`](https://tvm.apache.org/docs/api/python/te.html#tvm.te.compute).
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tvm-rfcs] junrushao1994 commented on a change in pull request #5: [RFC] Meta Schedule (AutoTIR)

Posted by GitBox <gi...@apache.org>.
junrushao1994 commented on a change in pull request #5:
URL: https://github.com/apache/tvm-rfcs/pull/5#discussion_r664860038



##########
File path: rfcs/0001-meta-schedule-autotensorir.md
##########
@@ -0,0 +1,339 @@
+<!--- Licensed to the Apache Software Foundation (ASF) under one -->
+<!--- or more contributor license agreements.  See the NOTICE file -->
+<!--- distributed with this work for additional information -->
+<!--- regarding copyright ownership.  The ASF licenses this file -->
+<!--- to you under the Apache License, Version 2.0 (the -->
+<!--- "License"); you may not use this file except in compliance -->
+<!--- with the License.  You may obtain a copy of the License at -->
+
+<!---   http://www.apache.org/licenses/LICENSE-2.0 -->
+
+<!--- Unless required by applicable law or agreed to in writing, -->
+<!--- software distributed under the License is distributed on an -->
+<!--- "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY -->
+<!--- KIND, either express or implied.  See the License for the -->
+<!--- specific language governing permissions and limitations -->
+<!--- under the License. -->
+
+* Feature Name: Meta Schedule (AutoTIR)
+* Start Date: 2021-05-28
+* RFC PR: TBD (apache/tvm-rfcs#0000)
+* GitHub Issue: TBD (apache/tvm-rfcs#0000)
+
+## 1. Summary
+
+This proposal introduces Meta Schedule: a probabilistic scheduling DSL on TIR that unifies the approaches of AutoTVM and Auto Scheduler (Ansor). Meta schedule provides a pragmatic way to define the space of automatic tuning, extensibility in terms of all possible TIR schedule primitives like tensorization and loop partitioning, and customizability on every layer of the automation system.
+
+Meta Schedule is our 3rd generation automatic scheduling system.
+
+## 2. Motivation
+
+**Scheduling and Design Space**

Review comment:
       I sort of think it is better to make it inlined like a topic sentence of a paragraph, so I am going to merge this to the subsequent paragraph




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tvm-rfcs] tqchen commented on pull request #5: [RFC] Meta Schedule (AutoTIR)

Posted by GitBox <gi...@apache.org>.
tqchen commented on pull request #5:
URL: https://github.com/apache/tvm-rfcs/pull/5#issuecomment-887115326


   Thanks everyone. This RFC is now accepted. Assigning RFC number 0001


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tvm-rfcs] junrushao1994 commented on a change in pull request #5: [RFC] Meta Schedule (AutoTIR)

Posted by GitBox <gi...@apache.org>.
junrushao1994 commented on a change in pull request #5:
URL: https://github.com/apache/tvm-rfcs/pull/5#discussion_r661753097



##########
File path: rfcs/0001-meta-schedule-autotensorir.md
##########
@@ -0,0 +1,297 @@
+* Feature Name: Meta Schedule (AutoTensorIR)
+* Start Date: 2021-05-28
+* RFC PR: TBD (apache/tvm-rfcs#0000)
+* GitHub Issue: TBD (apache/tvm-rfcs#0000)
+
+## 1. Summary
+
+This proposal introduces Meta Schedule: a probabilistic scheduling DSL on TIR that unifies the approaches of AutoTVM and Auto Scheduler (Ansor). Meta schedule provides a pragmatic way to define the space of automatic tuning, extensibility in terms of all possible TIR schedule primitives like tensorization and loop partitioning, and customizability on every layer of the automation system.
+
+Meta Schedule is our 3rd generation automatic scheduling system.
+
+## 2. Motivation
+
+**Scheduling and Design Space**
+
+In TVM TensorIR, optimization of a TensorIR program is done via a sequence of transformations. For example, we reorder loops for better locality and we tensorize for specific hardware intrinsics. The process of invoking such a set of pre-defined transformations is called “**scheduling**”, and each transformation is called a “**schedule primitive**”. These primitives form a domain-specific language (DSL) describing the transformation of TensorIR programs. **Design space** is the set of all possible schedulings with respect to a TensorIR program.
+
+**Problems with the Current Scheduling System**
+
+* **Manual schedule**: Developers optimize their programs by manually invoking schedule primitives, i.e. explore points in the design space with humans in the loop. This can be a tedious and error-prone approach, hence the creation of AutoTVM and AutoScheduler (Ansor).
+* **AutoTVM**: The automation system requires users to define “schedule templates” as the design space for each operator. Therefore, it is inextensible to hundreds of operators.
+* **AutoScheduler (Ansor)**: It automatically generates schedule templates as the design space, according to a set of predefined “search rules”. However, it is non-trivial to extend AutoScheduler to new schedule primitives (tensorize, loop partition, software pipelining).
+* The three systems above have isolated sets of APIs with several layers of their own abstraction, which are not only hard to learn, but also engineering-intensive to customize.
+
+**Benefit of Meta Schedule**
+
+* Succinct syntax, consistent APIs to TensorIR schedule with no other layer of abstraction.
+* Provides unified APIs for implementing manual schedule, AutoTVM and AutoScheduler (Ansor).
+* Extensibility to all the schedule primitives, including tensorization and loop partitioning. Almost no extra effort is needed to use a new primitive in auto-tuning.
+* The automation infrastructure is customizable across every layer.
+
+## 3. Guide-level explanation
+
+In this section, we describe the syntax of meta schedule DSL, and how it could be used to describe and auto-generate the design space.
+
+### 3.1. Manual Schedule
+
+Meta schedule APIs are almost the same as TE or TensorIR scheduling. Here is an example of a manual schedule for matrix multiplication:
+
+```python
+# Designate a set of tile sizes
+i_tiles = [16, 8, 8, 8]
+j_tiles = [16, 8, 8, 8]
+k_tiles = [256, 8]
+
+# Tile the loops according to the tile sizes
+i_0, i_1, i_2, i_3 = sch.split(loop=i, factors=i_tiles)
+j_0, j_1, j_2, j_3 = sch.split(loop=j, factors=j_tiles)
+k_0, k_1           = sch.split(loop=k, factors=k_tiles)
+
+# Organize the loops into “SSRSRS” 6-level tiles
+sch.reorder(
+    i_0, j_0, # S
+    i_1, j_1, # S
+    k_0,      # R
+    i_2, j_2, # S
+    k_1,      # R
+    i_3, j_3, # S
+)
+```
+
+In this example, the developers may tweak the tile sizes and measure the performance of the generated kernels to explore the opportunities of potential optimization.
+
+Generally speaking, while writing a schedule, there are often some parameters that are hard to determine ahead of time, for example, tile sizes, unroll steps, or which tensor intrinsics to use. Developers may manually enumerate possible combinations of these unknown factors, and then pick the best schedule according to measurement results on their device.
+
+### 3.2. AutoTVM-style Design Space Description
+
+Meta schedule extends the schedule DSL with sampling instructions. When included in a schedule, these instructions parametrize the schedule from a single deterministic point to a space supported by random variables (tile size, etc.), making it possible for developers to describe the design space with meta schedule APIs.
+
+We can extend the matmul example above to cover all possible tilings using these sampling instructions:
+
+```python
+# Sample tile sizes
+i_tiles = sch.sample_perfect_tile(i, n=4)

Review comment:
       Yes. It is the opposite of "imperfect tiling" where the tile sizes are not divisible to `n`




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tvm-rfcs] junrushao1994 commented on a change in pull request #5: [RFC] Meta Schedule (AutoTIR)

Posted by GitBox <gi...@apache.org>.
junrushao1994 commented on a change in pull request #5:
URL: https://github.com/apache/tvm-rfcs/pull/5#discussion_r676888273



##########
File path: rfcs/0001-meta-schedule-autotensorir.md
##########
@@ -266,55 +322,83 @@ sch.reorder(l14, l18, l15, l19, l22, l16, l20, l23, l17, l21)
 ### 4.2. Exploring the Design Space
 
 Meta Schedule provides several built-in exploration strategies to exhaustively or efficiently search
-for efficient schedules.
-
-**Random search by replaying schedule functions.** With a user-provided schedule function
-as a black-box design space generator, our system could repetitively invoke such an opaque function
-without doing any extra analysis. The function could be written in C++ or Python, or any language
-that implements packed function FFI. If sampling instructions are present in the function, then each
-invocation results in a different IRModule after being scheduled because the random decisions are
-possibly changed across different runs. Effectively, it is equivalent to
-random exploration without trace, allowing the flexibility for users to define arbitrary functions
+for efficient schedules. Those strategies are mostly supported by re-execute either a function or a
+trace with a builtin interpreter in meta schedule, and this process is called **replay**.
+
+#### Random search by replaying schedule functions
+
+With a user-provided schedule function
+as a black-box design space generator, meta schedule could repetitively invokes such an opaque TVM
+packed function without doing any extra analysis.
+If sampling instructions are present in the trace, then scheduling is non-deterministic
+(random decisions may not be repeated across runs)
+Effectively, it is equivalent to random exploration without trace,
+allowing the flexibility for users to define arbitrary functions
 that trace may not well support (e.g. control flow divergence based on the value of intermediate
 random variables), but it forbids future opportunity of any trace-based analysis.
 
-**Random search by replaying traces.** Traces are obtained from a design space generator, and
-replayed with a builtin interpreter in our system. If sampling instructions are present on the
-traces, then their random decisions are mutated during each replay, i.e. jumps to a new point in the
+#### Random search by replaying traces
+
+A builtin interpreter directly replays the traces obtained
+from manual schedule, template-based or template-free design space generation.
+If sampling instructions are present on the traces,
+then their random decisions are mutated during each replay, i.e. jumps to a new point in the
 design space. Therefore, repetitive replay of those traces are equivalent to exploration of the
-design space. Our system could potentially benefit from trace-based analysis, including rejecting
-obviously invalid schedules (e.g. using too much CUDA resources), doing dead-code elimination to
-simplify a trace, extracting trace-based features used in the cost model, etc.
+design space. meta schedule could potentially benefit from trace-based analysis, making the search more

Review comment:
       That makes sense. I also added feature extraction related stuff




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tvm-rfcs] junrushao1994 commented on a change in pull request #5: [RFC] Meta Schedule (AutoTIR)

Posted by GitBox <gi...@apache.org>.
junrushao1994 commented on a change in pull request #5:
URL: https://github.com/apache/tvm-rfcs/pull/5#discussion_r668275556



##########
File path: rfcs/0001-meta-schedule-autotensorir.md
##########
@@ -0,0 +1,444 @@
+<!--- Licensed to the Apache Software Foundation (ASF) under one -->
+<!--- or more contributor license agreements.  See the NOTICE file -->
+<!--- distributed with this work for additional information -->
+<!--- regarding copyright ownership.  The ASF licenses this file -->
+<!--- to you under the Apache License, Version 2.0 (the -->
+<!--- "License"); you may not use this file except in compliance -->
+<!--- with the License.  You may obtain a copy of the License at -->
+
+<!---   http://www.apache.org/licenses/LICENSE-2.0 -->
+
+<!--- Unless required by applicable law or agreed to in writing, -->
+<!--- software distributed under the License is distributed on an -->
+<!--- "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY -->
+<!--- KIND, either express or implied.  See the License for the -->
+<!--- specific language governing permissions and limitations -->
+<!--- under the License. -->
+
+* Feature Name: Meta Schedule (Formerly AutoTIR)
+* Start Date: 2021-05-28
+* RFC PR: TBD (apache/tvm-rfcs#0000)
+* GitHub Issue: TBD (apache/tvm-rfcs#0000)
+
+## 1. Summary
+
+This proposal introduces Meta Schedule: a probabilistic scheduling DSL on TIR that unifies the
+approaches of AutoTVM and Auto Scheduler (Ansor). Meta schedule provides a pragmatic way to define
+the space of automatic tuning, extensibility in terms of all possible TIR schedule primitives like
+tensorization and loop partitioning, and customizability on every layer of the automation system.
+
+Meta Schedule is our 3rd generation automatic scheduling system.
+
+## 2. Motivation
+
+**Scheduling and design space.** In TVM TensorIR, optimization of a TensorIR program is done via a
+sequence of transformations. For example, we reorder loops for better locality and we tensorize for
+specific hardware intrinsics. The process of invoking such a set of pre-defined transformations is
+called "**scheduling**", and each transformation is called a "**schedule primitive**". These
+primitives form a domain-specific language (DSL) describing the transformation of TensorIR programs.
+**Design space** is the set of all possible schedulings with respect to a TensorIR program.
+
+**Problems with the current scheduling system.** Currently we have 3 sets of scheduling APIs:
+* **Manual schedule**: Developers optimize their programs by manually invoking schedule primitives,
+  i.e. explore points in the design space with humans in the loop. This can be a tedious and
+  error-prone approach, hence the creation of AutoTVM and AutoScheduler (Ansor).
+* **AutoTVM**: The automation system requires users to define "schedule templates" as the design
+  space for each operator. Therefore, it is inextensible to hundreds of operators and variety
+  hardware platforms.
+* **AutoScheduler (Ansor)**: It automatically generates schedule templates as the design space,
+  according to a set of predefined "search rules". However, it is non-trivial to extend
+  AutoScheduler to new schedule primitives (tensorize, loop partition, software pipelining, etc).
+* The three systems above have isolated sets of APIs with several layers of their own abstraction,
+  which are not only hard to learn, but also engineering-intensive to customize.
+
+**Benefits of Meta Schedule.**  Meta schedule provides:
+* Succinct syntax, consistent APIs to TensorIR schedule with no other layer of abstraction.
+* Unified APIs for implementing manual schedules, AutoTVM-style schedules, and AutoScheduler-style
+  schedules.
+* Extensibility of all the schedule primitives, including tensorization and loop partitioning.
+  Almost no extra effort is needed to use a new primitive in auto-tuning.
+* The automation infrastructure is extensible on every of its components. Every component of the
+  system can be customized easily in pure python or C++ or both. For example, one can develop a new
+  design space generator in python, a new ProgramRunner in python, etc.
+
+
+## 3. Guide-level explanation
+
+In this section, we describe the syntax of meta schedule DSL, and how it could be used to describe
+and auto-generate the design space.
+
+### 3.1. Manual Schedule
+
+Meta schedule APIs are almost the same as TE or TensorIR scheduling. Here is an example of a manual
+schedule for matrix multiplication:
+
+```python
+# Designate a set of tile sizes
+i_tiles = [16, 8, 8, 8]
+j_tiles = [16, 8, 8, 8]
+k_tiles = [256, 8]
+
+# Tile the loops according to the tile sizes
+i_0, i_1, i_2, i_3 = sch.split(loop=i, factors=i_tiles)
+j_0, j_1, j_2, j_3 = sch.split(loop=j, factors=j_tiles)
+k_0, k_1           = sch.split(loop=k, factors=k_tiles)
+
+# Organize the loops into "SSRSRS" 6-level tiles
+sch.reorder(
+    i_0, j_0, # S: the 1st spatial tile
+    i_1, j_1, # S: the 2nd spatial tile
+    k_0,      # R: the 1st reduction tile
+    i_2, j_2, # S: the 3rd spatial tile
+    k_1,      # R: the 2nd reduction tile
+    i_3, j_3, # S: the 4th spatial tile
+)
+```
+
+In this example, the developers may tweak the tile sizes and measure the performance of the
+generated kernels to explore the opportunities of potential optimization.
+
+Generally speaking, while writing a schedule, there are often some parameters that are hard to
+determine ahead of time, for example, tile sizes, unroll steps, or which tensor intrinsics to use.
+Developers may manually enumerate possible combinations of these unknown factors, and then pick the
+best schedule according to measurement results on their device.
+
+### 3.2. Composite Schedule Rules
+
+As introduced in the previous section, in TensorIR, each schedule primitive handles only a very
+basic transformation of the IR. For example, `split` only splits a loop into two new loops. In the
+real world, the over-fine granularity of those primitives usually leads to repetitive and verbose
+scheduling code, as
+[mentioned](https://discuss.tvm.apache.org/t/rfc-tensorir-a-schedulable-ir-for-tvm/7872/43?u=junrushao1994)
+by developers in our community.
+
+To make it more convenient and modular, we allow users to register "composite schedules" that apply

Review comment:
       Yeah it is not quite the same as the existing codebase.
   
   I was trying to use the word "composite" here against "primitive", which means it is a transformation "composed" of those "primitives". For example, if we have a TIR like: 
   
   ```python
   @tvm.script.tir
   def example_func(...):
     for i, j in ...:
       with tir.Block("B") ...:
         B[i, j] = A[i, j] + 1
     for i, j in ...:
       with tir.Block("C") ...:
         C[i, j] = B[i, j] + 1
     for i, j in ...:
       with tir.Block("D") ...:
         D[i, j] = C[i, j] + 1
   ```
   
   And suppose we have a composite schedule called `InlineAllElementwiseOperations`, which inlines all the elementwise computation into their consumers, then we can invoke it by:
   
   ```python
   sch = tir.Schedule(example_func)
   InlineAllElementwiseOperations().apply(sch, sch.get_block("D"))
   ```
   
   Then the resulting TIR is:
   
   ```python
   @tvm.script.tir
   def example_func(...):
     for i, j in ...:
       with tir.Block("D") ...:
         D[i, j] = A[i, j] + 1 + 1 + 1
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tvm-rfcs] junrushao1994 commented on a change in pull request #5: [RFC] Meta Schedule (AutoTensorIR)

Posted by GitBox <gi...@apache.org>.
junrushao1994 commented on a change in pull request #5:
URL: https://github.com/apache/tvm-rfcs/pull/5#discussion_r647967336



##########
File path: rfcs/0001-meta-schedule-autotensorir.md
##########
@@ -0,0 +1,297 @@
+* Feature Name: Meta Schedule (AutoTensorIR)
+* Start Date: 2021-05-28
+* RFC PR: TBD (apache/tvm-rfcs#0000)
+* GitHub Issue: TBD (apache/tvm-rfcs#0000)
+
+## 1. Summary
+
+This proposal introduces Meta Schedule: a probabilistic scheduling DSL on TIR that unifies the approaches of AutoTVM and Auto Scheduler (Ansor). Meta schedule provides a pragmatic way to define the space of automatic tuning, extensibility in terms of all possible TIR schedule primitives like tensorization and loop partitioning, and customizability on every layer of the automation system.
+
+Meta Schedule is our 3rd generation automatic scheduling system.
+
+## 2. Motivation
+
+**Scheduling and Design Space**
+
+In TVM TensorIR, optimization of a TensorIR program is done via a sequence of transformations. For example, we reorder loops for better locality and we tensorize for specific hardware intrinsics. The process of invoking such a set of pre-defined transformations is called “**scheduling**”, and each transformation is called a “**schedule primitive**”. These primitives form a domain-specific language (DSL) describing the transformation of TensorIR programs. **Design space** is the set of all possible schedulings with respect to a TensorIR program.
+
+**Problems with the Current Scheduling System**
+
+* **Manual schedule**: Developers optimize their programs by manually invoking schedule primitives, i.e. explore points in the design space with humans in the loop. This can be a tedious and error-prone approach, hence the creation of AutoTVM and AutoScheduler (Ansor).
+* **AutoTVM**: The automation system requires users to define “schedule templates” as the design space for each operator. Therefore, it is inextensible to hundreds of operators.
+* **AutoScheduler (Ansor)**: It automatically generates schedule templates as the design space, according to a set of predefined “search rules”. However, it is non-trivial to extend AutoScheduler to new schedule primitives (tensorize, loop partition, software pipelining).
+* The three systems above have isolated sets of APIs with several layers of their own abstraction, which are not only hard to learn, but also engineering-intensive to customize.
+
+**Benefit of Meta Schedule**
+
+* Succinct syntax, consistent APIs to TensorIR schedule with no other layer of abstraction.
+* Provides unified APIs for implementing manual schedule, AutoTVM and AutoScheduler (Ansor).
+* Extensibility to all the schedule primitives, including tensorization and loop partitioning. Almost no extra effort is needed to use a new primitive in auto-tuning.
+* The automation infrastructure is customizable across every layer.
+
+## 3. Guide-level explanation
+
+In this section, we describe the syntax of meta schedule DSL, and how it could be used to describe and auto-generate the design space.
+
+### 3.1. Manual Schedule
+
+Meta schedule APIs are almost the same as TE or TensorIR scheduling. Here is an example of a manual schedule for matrix multiplication:
+
+```python
+# Designate a set of tile sizes
+i_tiles = [16, 8, 8, 8]
+j_tiles = [16, 8, 8, 8]
+k_tiles = [256, 8]
+
+# Tile the loops according to the tile sizes
+i_0, i_1, i_2, i_3 = sch.split(loop=i, factors=i_tiles)
+j_0, j_1, j_2, j_3 = sch.split(loop=j, factors=j_tiles)
+k_0, k_1           = sch.split(loop=k, factors=k_tiles)
+
+# Organize the loops into “SSRSRS” 6-level tiles
+sch.reorder(
+    i_0, j_0, # S
+    i_1, j_1, # S
+    k_0,      # R
+    i_2, j_2, # S
+    k_1,      # R
+    i_3, j_3, # S
+)
+```
+
+In this example, the developers may tweak the tile sizes and measure the performance of the generated kernels to explore the opportunities of potential optimization.
+
+Generally speaking, while writing a schedule, there are often some parameters that are hard to determine ahead of time, for example, tile sizes, unroll steps, or which tensor intrinsics to use. Developers may manually enumerate possible combinations of these unknown factors, and then pick the best schedule according to measurement results on their device.
+
+### 3.2. AutoTVM-style Design Space Description
+
+Meta schedule extends the schedule DSL with sampling instructions. When included in a schedule, these instructions parametrize the schedule from a single deterministic point to a space supported by random variables (tile size, etc.), making it possible for developers to describe the design space with meta schedule APIs.
+
+We can extend the matmul example above to cover all possible tilings using these sampling instructions:
+
+```python
+# Sample tile sizes
+i_tiles = sch.sample_perfect_tile(i, n=4)
+j_tiles = sch.sample_perfect_tile(j, n=4)
+k_tiles = sch.sample_perfect_tile(k, n=2)
+# Tile the loops according to the random variables
+i_0, i_1, i_2, i_3 = sch.split(loop=i, factors=i_tiles)
+j_0, j_1, j_2, j_3 = sch.split(loop=j, factors=j_tiles)
+k_0, k_1           = sch.split(loop=k, factors=k_tiles)
+# Organize the loops into “SSRSRS” 6-level tiles
+sch.reorder(
+    i_0, j_0, # S
+    i_1, j_1, # S
+    k_0,      # R
+    i_2, j_2, # S
+    k_1,      # R
+    i_3, j_3, # S
+)
+```
+
+### 3.3. Composite Schedule
+
+Each schedule primitive handles only a very basic operation to transform the IR, for example, `split` only splits a loop into two. In the real world, the over-fine granularity of those primitives usually leads to repetitive and verbose scheduling code, as [mentioned](https://discuss.tvm.apache.org/t/rfc-tensorir-a-schedulable-ir-for-tvm/7872/43?u=junrushao1994) by developers in our community.
+
+To counter this challenge, we allow users to register “composite schedules” that analyze the IR, and apply a set of schedule primitives correspondingly. For instance, a composite schedule may inspect a TensorIR block and decide whether we should call `compute_inline` on it. The composite schedule may use sampling instructions to fill in undecided choices.
+
+Our system also ships with some built-in composite schedules, including:
+
+* Multi-level tiling
+* Inline pure spatial blocks
+* Parallelize & vectorize & unroll
+* Auto tensorize
+* …
+
+### 3.4. AutoScheduler-style Design Space Generation
+
+AutoScheduler (Ansor) generates schedule templates by applying their SearchRules to each stage. Meta schedule treats a search rule as a composite schedule, and applies each composite schedule to each block of TensorIR to generate the design space.
+
+### 3.5. Unifying manual schedule / AutoTVM / Ansor
+
+In this section, we show that the design space induced by TE manual schedule, AutoTVM and Ansor are all subsets of meta schedule, and meta schedule further allows mixing those three styles to search jointly.
+
+**Manual schedule**. The TE schedule is a special case of a meta schedule program, where there is no randomness introduced by sampling instructions. It is a single point in terms of design space.
+
+**AutoTVM (Template-based tuning)**. Writing one or more schedule functions in meta schedule, potentially with sampling instructions, is a natural representation of AutoTVM’s schedule templates (knobs). The PPL generates one or more traces as the design space to explore.
+
+**AutoScheduler (Ansor, Template-free tuning)**. As mentioned in the previous section, application of composite schedule rules generates the design space, which is equivalent to Ansor’s sketch generation.
+
+**Mixing styles in design space definition**. By taking union of the spaces induced by the three special cases, our system allows developers to combine generic rules that Ansor provides and operator-specific scheduling.
+
+## 4. Reference-level explanation

Review comment:
       Hmmm tbh I don't understand this term either...TK told me it means developer facing stuff




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tvm-rfcs] junrushao1994 commented on a change in pull request #5: [RFC] Meta Schedule (AutoTIR)

Posted by GitBox <gi...@apache.org>.
junrushao1994 commented on a change in pull request #5:
URL: https://github.com/apache/tvm-rfcs/pull/5#discussion_r661793736



##########
File path: rfcs/0001-meta-schedule-autotensorir.md
##########
@@ -0,0 +1,297 @@
+* Feature Name: Meta Schedule (AutoTensorIR)
+* Start Date: 2021-05-28
+* RFC PR: TBD (apache/tvm-rfcs#0000)
+* GitHub Issue: TBD (apache/tvm-rfcs#0000)
+
+## 1. Summary
+
+This proposal introduces Meta Schedule: a probabilistic scheduling DSL on TIR that unifies the approaches of AutoTVM and Auto Scheduler (Ansor). Meta schedule provides a pragmatic way to define the space of automatic tuning, extensibility in terms of all possible TIR schedule primitives like tensorization and loop partitioning, and customizability on every layer of the automation system.
+
+Meta Schedule is our 3rd generation automatic scheduling system.
+
+## 2. Motivation
+
+**Scheduling and Design Space**
+
+In TVM TensorIR, optimization of a TensorIR program is done via a sequence of transformations. For example, we reorder loops for better locality and we tensorize for specific hardware intrinsics. The process of invoking such a set of pre-defined transformations is called “**scheduling**”, and each transformation is called a “**schedule primitive**”. These primitives form a domain-specific language (DSL) describing the transformation of TensorIR programs. **Design space** is the set of all possible schedulings with respect to a TensorIR program.
+
+**Problems with the Current Scheduling System**
+
+* **Manual schedule**: Developers optimize their programs by manually invoking schedule primitives, i.e. explore points in the design space with humans in the loop. This can be a tedious and error-prone approach, hence the creation of AutoTVM and AutoScheduler (Ansor).
+* **AutoTVM**: The automation system requires users to define “schedule templates” as the design space for each operator. Therefore, it is inextensible to hundreds of operators.
+* **AutoScheduler (Ansor)**: It automatically generates schedule templates as the design space, according to a set of predefined “search rules”. However, it is non-trivial to extend AutoScheduler to new schedule primitives (tensorize, loop partition, software pipelining).
+* The three systems above have isolated sets of APIs with several layers of their own abstraction, which are not only hard to learn, but also engineering-intensive to customize.
+
+**Benefit of Meta Schedule**
+
+* Succinct syntax, consistent APIs to TensorIR schedule with no other layer of abstraction.
+* Provides unified APIs for implementing manual schedule, AutoTVM and AutoScheduler (Ansor).
+* Extensibility to all the schedule primitives, including tensorization and loop partitioning. Almost no extra effort is needed to use a new primitive in auto-tuning.
+* The automation infrastructure is customizable across every layer.
+
+## 3. Guide-level explanation
+
+In this section, we describe the syntax of meta schedule DSL, and how it could be used to describe and auto-generate the design space.
+
+### 3.1. Manual Schedule
+
+Meta schedule APIs are almost the same as TE or TensorIR scheduling. Here is an example of a manual schedule for matrix multiplication:
+
+```python
+# Designate a set of tile sizes
+i_tiles = [16, 8, 8, 8]
+j_tiles = [16, 8, 8, 8]
+k_tiles = [256, 8]
+
+# Tile the loops according to the tile sizes
+i_0, i_1, i_2, i_3 = sch.split(loop=i, factors=i_tiles)
+j_0, j_1, j_2, j_3 = sch.split(loop=j, factors=j_tiles)
+k_0, k_1           = sch.split(loop=k, factors=k_tiles)
+
+# Organize the loops into “SSRSRS” 6-level tiles
+sch.reorder(
+    i_0, j_0, # S
+    i_1, j_1, # S
+    k_0,      # R
+    i_2, j_2, # S
+    k_1,      # R
+    i_3, j_3, # S
+)
+```
+
+In this example, the developers may tweak the tile sizes and measure the performance of the generated kernels to explore the opportunities of potential optimization.
+
+Generally speaking, while writing a schedule, there are often some parameters that are hard to determine ahead of time, for example, tile sizes, unroll steps, or which tensor intrinsics to use. Developers may manually enumerate possible combinations of these unknown factors, and then pick the best schedule according to measurement results on their device.
+
+### 3.2. AutoTVM-style Design Space Description
+
+Meta schedule extends the schedule DSL with sampling instructions. When included in a schedule, these instructions parametrize the schedule from a single deterministic point to a space supported by random variables (tile size, etc.), making it possible for developers to describe the design space with meta schedule APIs.
+
+We can extend the matmul example above to cover all possible tilings using these sampling instructions:
+
+```python
+# Sample tile sizes
+i_tiles = sch.sample_perfect_tile(i, n=4)
+j_tiles = sch.sample_perfect_tile(j, n=4)
+k_tiles = sch.sample_perfect_tile(k, n=2)
+# Tile the loops according to the random variables
+i_0, i_1, i_2, i_3 = sch.split(loop=i, factors=i_tiles)
+j_0, j_1, j_2, j_3 = sch.split(loop=j, factors=j_tiles)
+k_0, k_1           = sch.split(loop=k, factors=k_tiles)
+# Organize the loops into “SSRSRS” 6-level tiles
+sch.reorder(
+    i_0, j_0, # S
+    i_1, j_1, # S
+    k_0,      # R
+    i_2, j_2, # S
+    k_1,      # R
+    i_3, j_3, # S
+)
+```
+
+### 3.3. Composite Schedule
+
+Each schedule primitive handles only a very basic operation to transform the IR, for example, `split` only splits a loop into two. In the real world, the over-fine granularity of those primitives usually leads to repetitive and verbose scheduling code, as [mentioned](https://discuss.tvm.apache.org/t/rfc-tensorir-a-schedulable-ir-for-tvm/7872/43?u=junrushao1994) by developers in our community.
+
+To counter this challenge, we allow users to register “composite schedules” that analyze the IR, and apply a set of schedule primitives correspondingly. For instance, a composite schedule may inspect a TensorIR block and decide whether we should call `compute_inline` on it. The composite schedule may use sampling instructions to fill in undecided choices.
+
+Our system also ships with some built-in composite schedules, including:
+
+* Multi-level tiling
+* Inline pure spatial blocks
+* Parallelize & vectorize & unroll
+* Auto tensorize
+* …
+
+### 3.4. AutoScheduler-style Design Space Generation
+
+AutoScheduler (Ansor) generates schedule templates by applying their SearchRules to each stage. Meta schedule treats a search rule as a composite schedule, and applies each composite schedule to each block of TensorIR to generate the design space.
+
+### 3.5. Unifying manual schedule / AutoTVM / Ansor
+
+In this section, we show that the design space induced by TE manual schedule, AutoTVM and Ansor are all subsets of meta schedule, and meta schedule further allows mixing those three styles to search jointly.
+
+**Manual schedule**. The TE schedule is a special case of a meta schedule program, where there is no randomness introduced by sampling instructions. It is a single point in terms of design space.
+
+**AutoTVM (Template-based tuning)**. Writing one or more schedule functions in meta schedule, potentially with sampling instructions, is a natural representation of AutoTVM’s schedule templates (knobs). The PPL generates one or more traces as the design space to explore.
+
+**AutoScheduler (Ansor, Template-free tuning)**. As mentioned in the previous section, application of composite schedule rules generates the design space, which is equivalent to Ansor’s sketch generation.
+
+**Mixing styles in design space definition**. By taking union of the spaces induced by the three special cases, our system allows developers to combine generic rules that Ansor provides and operator-specific scheduling.
+
+## 4. Reference-level explanation
+
+In this section, we introduce the underlying techniques for the automation system to extract and explore the design space.
+
+### 4.1. Execution trace as the design space
+
+**Trace**. To represent the design space defined by the meta schedule DSL, the underlying system records all the instructions users applied to the schedule class, including sampling and schedule primitives. We call this list of instructions a trace.
+
+Executing the example above results in the following trace:
+
+```
+Instruction 0. Sample-Perfect-Tile
+Instruction 1. Sample-Perfect-Tile
+Instruction 2. Sample-Perfect-Tile
+Instruction 3. Split
+Instruction 4. Split
+Instruction 5. Split
+Instruction 6. Reorder
+```
+
+The trace is not directly user-facing, but a data structure inside the user-facing `Schedule` class that records the execution. The automation system extracts the trace and finds out the design space according to the sampling instructions.
+
+**Union of traces**. Often a single trace is unable to represent the entire space. Therefore, more precisely, our system works on a list of traces as the union of potential design space.
+
+**Fork a trace**. When two different decisions in the scheduling process are equally important to generate high-performance schedules, we allow forking the trace into two, and the design space is the union of the forked traces.
+
+### 4.2. Exploring the Search Space
+
+Meta Schedule provides several built-in exploration strategies to exhaustively or efficiently search for efficient schedules.
+
+**Program replay**. A simple strategy that replays the schedule program that generates the PPL, and doesn’t use any advantage provided by the PPL.
+
+**Random search**. Extracts the PPL, and repetitively re-executes the PPL by flipping coins purely randomly.
+
+**Cost-model-guided evolutionary search**. A more efficient exploration strategy. We define two sets of rules:
+
+* Mutator: defines how to jump to a point’s “neighbor” in the design space
+* Postprocessor: sometimes it is non-trivial to statically determine the PPL, for example:
+  * There is a hard requirement in CUDA that the maximum number of threads should not exceed 1024, but it is a random variable that cannot be determined before actually executing the PPL. In this case, we write a postprocessor that errors out when the condition is not satisfied.
+  * The number of outer loops to be fused together depends on their extents, which are random variables. In this case, we annotate the maximum extent allowed on the block, and do actual fusion in a postprocessor.
+
+Our evolutionary search algorithm uses mutators to find possible schedules in the design space, then applies postprocessors, and asks the cost model to predict its performance. After several iterations, the new schedules with the highest scores are finally compiled and measured on device. Epsilon-greedy is used in this process to balance exploitation and exploration.
+
+### 4.3. Python first for flexibility & customizability
+
+We implement the system in a way that all levels are decoupled and open to customization, aiming at providing a playground for developers to try out new ideas and potentially deliver performance quickly.
+
+While all the important APIs are implemented in C++ for efficiency, every part of the system can be easily switched to customized python implementation. For example,
+
+**Customize design space in python**. Can be a python function that does the schedule
+
+```python
+def schedule_matmul(sch) -> sch:
+    i, j, k = sch.get_loops(sch.get_block(“matmul”))
+    i_tiles = sch.sample_perfect_tile(i, n=4)
+    j_tiles = sch.sample_perfect_tile(j, n=4)
+    k_tiles = sch.sample_perfect_tile(k, n=2)
+    # Tile the loops according to the random variables
+    i_0, i_1, i_2, i_3 = sch.split(loop=i, factors=i_tiles)
+    j_0, j_1, j_2, j_3 = sch.split(loop=j, factors=j_tiles)
+    k_0, k_1 = sch.split(loop=k, factors=k_tiles)
+    # Organize the loops into “SSRSRS” 6-level tiles
+    sch.reorder(
+        i_0, j_0, # S
+        i_1, j_1, # S
+        k_0,      # R
+        i_2, j_2, # S
+        k_1,      # R
+        i_3, j_3, # S
+    )
+    return sch
+```
+
+**Customize composite schedule in python**. We provide two ways to define a composite schedule in python:
+
+Method 1. A simple decorator that converts a python function to a composite schedule
+
+```python
+@tir.as_composite_schedule(name="multi-level-tiling")
+def multi_level_tiling(sch: Schedule, block: BlockRV) -> Union[Schedule, List[Schedule]]:
+    ...
+```
+
+Method 2. Derive from `PyCompositeSchedule`, providing extra functionalities like initialization
+
+```python
+class MultiLevelTiling(PyCompositeSchedule):
+    def initialize(...):
+        ...
+    def apply(...):
+        ...
+```
+
+**Customize exploration strategies in python**. Developers can implement any search algorithm in python as well by deriving from `PySearchPolicy`.
+
+**Other customizable components**. This list includes:
+
+* Cost model
+* Database

Review comment:
       for storing and managing tuning logs




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org