You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@tvm.apache.org by GitBox <gi...@apache.org> on 2021/06/08 21:04:33 UTC

[GitHub] [tvm-rfcs] comaniac commented on a change in pull request #5: [RFC] Meta Schedule (AutoTensorIR)

comaniac commented on a change in pull request #5:
URL: https://github.com/apache/tvm-rfcs/pull/5#discussion_r647775615



##########
File path: rfcs/0001-meta-schedule-autotensorir.md
##########
@@ -0,0 +1,297 @@
+* Feature Name: Meta Schedule (AutoTensorIR)
+* Start Date: 2021-05-28
+* RFC PR: TBD (apache/tvm-rfcs#0000)
+* GitHub Issue: TBD (apache/tvm-rfcs#0000)
+
+## 1. Summary
+
+This proposal introduces Meta Schedule: a probabilistic scheduling DSL on TIR that unifies the approaches of AutoTVM and Auto Scheduler (Ansor). Meta schedule provides a pragmatic way to define the space of automatic tuning, extensibility in terms of all possible TIR schedule primitives like tensorization and loop partitioning, and customizability on every layer of the automation system.
+
+Meta Schedule is our 3rd generation automatic scheduling system.
+
+## 2. Motivation
+
+**Scheduling and Design Space**
+
+In TVM TensorIR, optimization of a TensorIR program is done via a sequence of transformations. For example, we reorder loops for better locality and we tensorize for specific hardware intrinsics. The process of invoking such a set of pre-defined transformations is called “**scheduling**”, and each transformation is called a “**schedule primitive**”. These primitives form a domain-specific language (DSL) describing the transformation of TensorIR programs. **Design space** is the set of all possible schedulings with respect to a TensorIR program.
+
+**Problems with the Current Scheduling System**
+
+* **Manual schedule**: Developers optimize their programs by manually invoking schedule primitives, i.e. explore points in the design space with humans in the loop. This can be a tedious and error-prone approach, hence the creation of AutoTVM and AutoScheduler (Ansor).
+* **AutoTVM**: The automation system requires users to define “schedule templates” as the design space for each operator. Therefore, it is inextensible to hundreds of operators.
+* **AutoScheduler (Ansor)**: It automatically generates schedule templates as the design space, according to a set of predefined “search rules”. However, it is non-trivial to extend AutoScheduler to new schedule primitives (tensorize, loop partition, software pipelining).
+* The three systems above have isolated sets of APIs with several layers of their own abstraction, which are not only hard to learn, but also engineering-intensive to customize.
+
+**Benefit of Meta Schedule**
+
+* Succinct syntax, consistent APIs to TensorIR schedule with no other layer of abstraction.
+* Provides unified APIs for implementing manual schedule, AutoTVM and AutoScheduler (Ansor).
+* Extensibility to all the schedule primitives, including tensorization and loop partitioning. Almost no extra effort is needed to use a new primitive in auto-tuning.
+* The automation infrastructure is customizable across every layer.
+
+## 3. Guide-level explanation
+
+In this section, we describe the syntax of meta schedule DSL, and how it could be used to describe and auto-generate the design space.
+
+### 3.1. Manual Schedule
+
+Meta schedule APIs are almost the same as TE or TensorIR scheduling. Here is an example of a manual schedule for matrix multiplication:
+
+```python
+# Designate a set of tile sizes
+i_tiles = [16, 8, 8, 8]
+j_tiles = [16, 8, 8, 8]
+k_tiles = [256, 8]
+
+# Tile the loops according to the tile sizes
+i_0, i_1, i_2, i_3 = sch.split(loop=i, factors=i_tiles)
+j_0, j_1, j_2, j_3 = sch.split(loop=j, factors=j_tiles)
+k_0, k_1           = sch.split(loop=k, factors=k_tiles)
+
+# Organize the loops into “SSRSRS” 6-level tiles
+sch.reorder(
+    i_0, j_0, # S
+    i_1, j_1, # S
+    k_0,      # R
+    i_2, j_2, # S
+    k_1,      # R
+    i_3, j_3, # S
+)
+```
+
+In this example, the developers may tweak the tile sizes and measure the performance of the generated kernels to explore the opportunities of potential optimization.
+
+Generally speaking, while writing a schedule, there are often some parameters that are hard to determine ahead of time, for example, tile sizes, unroll steps, or which tensor intrinsics to use. Developers may manually enumerate possible combinations of these unknown factors, and then pick the best schedule according to measurement results on their device.
+
+### 3.2. AutoTVM-style Design Space Description
+
+Meta schedule extends the schedule DSL with sampling instructions. When included in a schedule, these instructions parametrize the schedule from a single deterministic point to a space supported by random variables (tile size, etc.), making it possible for developers to describe the design space with meta schedule APIs.
+
+We can extend the matmul example above to cover all possible tilings using these sampling instructions:
+
+```python
+# Sample tile sizes
+i_tiles = sch.sample_perfect_tile(i, n=4)

Review comment:
       Just out of curiosity, will we have something like `sample_tile` that samples not only the factors of `i` but all integers in `[1, i)`?

##########
File path: rfcs/0001-meta-schedule-autotensorir.md
##########
@@ -0,0 +1,297 @@
+* Feature Name: Meta Schedule (AutoTensorIR)
+* Start Date: 2021-05-28
+* RFC PR: TBD (apache/tvm-rfcs#0000)
+* GitHub Issue: TBD (apache/tvm-rfcs#0000)
+
+## 1. Summary
+
+This proposal introduces Meta Schedule: a probabilistic scheduling DSL on TIR that unifies the approaches of AutoTVM and Auto Scheduler (Ansor). Meta schedule provides a pragmatic way to define the space of automatic tuning, extensibility in terms of all possible TIR schedule primitives like tensorization and loop partitioning, and customizability on every layer of the automation system.
+
+Meta Schedule is our 3rd generation automatic scheduling system.
+
+## 2. Motivation
+
+**Scheduling and Design Space**
+
+In TVM TensorIR, optimization of a TensorIR program is done via a sequence of transformations. For example, we reorder loops for better locality and we tensorize for specific hardware intrinsics. The process of invoking such a set of pre-defined transformations is called “**scheduling**”, and each transformation is called a “**schedule primitive**”. These primitives form a domain-specific language (DSL) describing the transformation of TensorIR programs. **Design space** is the set of all possible schedulings with respect to a TensorIR program.
+
+**Problems with the Current Scheduling System**
+
+* **Manual schedule**: Developers optimize their programs by manually invoking schedule primitives, i.e. explore points in the design space with humans in the loop. This can be a tedious and error-prone approach, hence the creation of AutoTVM and AutoScheduler (Ansor).
+* **AutoTVM**: The automation system requires users to define “schedule templates” as the design space for each operator. Therefore, it is inextensible to hundreds of operators.
+* **AutoScheduler (Ansor)**: It automatically generates schedule templates as the design space, according to a set of predefined “search rules”. However, it is non-trivial to extend AutoScheduler to new schedule primitives (tensorize, loop partition, software pipelining).
+* The three systems above have isolated sets of APIs with several layers of their own abstraction, which are not only hard to learn, but also engineering-intensive to customize.
+
+**Benefit of Meta Schedule**
+
+* Succinct syntax, consistent APIs to TensorIR schedule with no other layer of abstraction.
+* Provides unified APIs for implementing manual schedule, AutoTVM and AutoScheduler (Ansor).
+* Extensibility to all the schedule primitives, including tensorization and loop partitioning. Almost no extra effort is needed to use a new primitive in auto-tuning.
+* The automation infrastructure is customizable across every layer.
+
+## 3. Guide-level explanation
+
+In this section, we describe the syntax of meta schedule DSL, and how it could be used to describe and auto-generate the design space.
+
+### 3.1. Manual Schedule
+
+Meta schedule APIs are almost the same as TE or TensorIR scheduling. Here is an example of a manual schedule for matrix multiplication:
+
+```python
+# Designate a set of tile sizes
+i_tiles = [16, 8, 8, 8]
+j_tiles = [16, 8, 8, 8]
+k_tiles = [256, 8]
+
+# Tile the loops according to the tile sizes
+i_0, i_1, i_2, i_3 = sch.split(loop=i, factors=i_tiles)
+j_0, j_1, j_2, j_3 = sch.split(loop=j, factors=j_tiles)
+k_0, k_1           = sch.split(loop=k, factors=k_tiles)
+
+# Organize the loops into “SSRSRS” 6-level tiles
+sch.reorder(
+    i_0, j_0, # S
+    i_1, j_1, # S
+    k_0,      # R
+    i_2, j_2, # S
+    k_1,      # R
+    i_3, j_3, # S
+)
+```
+
+In this example, the developers may tweak the tile sizes and measure the performance of the generated kernels to explore the opportunities of potential optimization.
+
+Generally speaking, while writing a schedule, there are often some parameters that are hard to determine ahead of time, for example, tile sizes, unroll steps, or which tensor intrinsics to use. Developers may manually enumerate possible combinations of these unknown factors, and then pick the best schedule according to measurement results on their device.
+
+### 3.2. AutoTVM-style Design Space Description
+
+Meta schedule extends the schedule DSL with sampling instructions. When included in a schedule, these instructions parametrize the schedule from a single deterministic point to a space supported by random variables (tile size, etc.), making it possible for developers to describe the design space with meta schedule APIs.
+
+We can extend the matmul example above to cover all possible tilings using these sampling instructions:
+
+```python
+# Sample tile sizes
+i_tiles = sch.sample_perfect_tile(i, n=4)
+j_tiles = sch.sample_perfect_tile(j, n=4)
+k_tiles = sch.sample_perfect_tile(k, n=2)
+# Tile the loops according to the random variables
+i_0, i_1, i_2, i_3 = sch.split(loop=i, factors=i_tiles)
+j_0, j_1, j_2, j_3 = sch.split(loop=j, factors=j_tiles)
+k_0, k_1           = sch.split(loop=k, factors=k_tiles)
+# Organize the loops into “SSRSRS” 6-level tiles
+sch.reorder(
+    i_0, j_0, # S
+    i_1, j_1, # S
+    k_0,      # R
+    i_2, j_2, # S
+    k_1,      # R
+    i_3, j_3, # S
+)
+```
+
+### 3.3. Composite Schedule
+
+Each schedule primitive handles only a very basic operation to transform the IR, for example, `split` only splits a loop into two. In the real world, the over-fine granularity of those primitives usually leads to repetitive and verbose scheduling code, as [mentioned](https://discuss.tvm.apache.org/t/rfc-tensorir-a-schedulable-ir-for-tvm/7872/43?u=junrushao1994) by developers in our community.
+
+To counter this challenge, we allow users to register “composite schedules” that analyze the IR, and apply a set of schedule primitives correspondingly. For instance, a composite schedule may inspect a TensorIR block and decide whether we should call `compute_inline` on it. The composite schedule may use sampling instructions to fill in undecided choices.
+
+Our system also ships with some built-in composite schedules, including:
+
+* Multi-level tiling
+* Inline pure spatial blocks
+* Parallelize & vectorize & unroll
+* Auto tensorize
+* …
+
+### 3.4. AutoScheduler-style Design Space Generation
+
+AutoScheduler (Ansor) generates schedule templates by applying their SearchRules to each stage. Meta schedule treats a search rule as a composite schedule, and applies each composite schedule to each block of TensorIR to generate the design space.

Review comment:
       IMHO, this subsection can be merged to 3.3 as they are basically talking about the same thing. Using Ansor as an example when introducing the composite schedule can also help readers understand its purpose easily.

##########
File path: rfcs/0001-meta-schedule-autotensorir.md
##########
@@ -0,0 +1,297 @@
+* Feature Name: Meta Schedule (AutoTensorIR)
+* Start Date: 2021-05-28
+* RFC PR: TBD (apache/tvm-rfcs#0000)
+* GitHub Issue: TBD (apache/tvm-rfcs#0000)
+
+## 1. Summary
+
+This proposal introduces Meta Schedule: a probabilistic scheduling DSL on TIR that unifies the approaches of AutoTVM and Auto Scheduler (Ansor). Meta schedule provides a pragmatic way to define the space of automatic tuning, extensibility in terms of all possible TIR schedule primitives like tensorization and loop partitioning, and customizability on every layer of the automation system.
+
+Meta Schedule is our 3rd generation automatic scheduling system.
+
+## 2. Motivation
+
+**Scheduling and Design Space**
+
+In TVM TensorIR, optimization of a TensorIR program is done via a sequence of transformations. For example, we reorder loops for better locality and we tensorize for specific hardware intrinsics. The process of invoking such a set of pre-defined transformations is called “**scheduling**”, and each transformation is called a “**schedule primitive**”. These primitives form a domain-specific language (DSL) describing the transformation of TensorIR programs. **Design space** is the set of all possible schedulings with respect to a TensorIR program.
+
+**Problems with the Current Scheduling System**
+
+* **Manual schedule**: Developers optimize their programs by manually invoking schedule primitives, i.e. explore points in the design space with humans in the loop. This can be a tedious and error-prone approach, hence the creation of AutoTVM and AutoScheduler (Ansor).
+* **AutoTVM**: The automation system requires users to define “schedule templates” as the design space for each operator. Therefore, it is inextensible to hundreds of operators.
+* **AutoScheduler (Ansor)**: It automatically generates schedule templates as the design space, according to a set of predefined “search rules”. However, it is non-trivial to extend AutoScheduler to new schedule primitives (tensorize, loop partition, software pipelining).
+* The three systems above have isolated sets of APIs with several layers of their own abstraction, which are not only hard to learn, but also engineering-intensive to customize.
+
+**Benefit of Meta Schedule**
+
+* Succinct syntax, consistent APIs to TensorIR schedule with no other layer of abstraction.
+* Provides unified APIs for implementing manual schedule, AutoTVM and AutoScheduler (Ansor).
+* Extensibility to all the schedule primitives, including tensorization and loop partitioning. Almost no extra effort is needed to use a new primitive in auto-tuning.
+* The automation infrastructure is customizable across every layer.
+
+## 3. Guide-level explanation
+
+In this section, we describe the syntax of meta schedule DSL, and how it could be used to describe and auto-generate the design space.
+
+### 3.1. Manual Schedule
+
+Meta schedule APIs are almost the same as TE or TensorIR scheduling. Here is an example of a manual schedule for matrix multiplication:
+
+```python
+# Designate a set of tile sizes
+i_tiles = [16, 8, 8, 8]
+j_tiles = [16, 8, 8, 8]
+k_tiles = [256, 8]
+
+# Tile the loops according to the tile sizes
+i_0, i_1, i_2, i_3 = sch.split(loop=i, factors=i_tiles)
+j_0, j_1, j_2, j_3 = sch.split(loop=j, factors=j_tiles)
+k_0, k_1           = sch.split(loop=k, factors=k_tiles)
+
+# Organize the loops into “SSRSRS” 6-level tiles
+sch.reorder(
+    i_0, j_0, # S
+    i_1, j_1, # S
+    k_0,      # R
+    i_2, j_2, # S
+    k_1,      # R
+    i_3, j_3, # S
+)
+```
+
+In this example, the developers may tweak the tile sizes and measure the performance of the generated kernels to explore the opportunities of potential optimization.
+
+Generally speaking, while writing a schedule, there are often some parameters that are hard to determine ahead of time, for example, tile sizes, unroll steps, or which tensor intrinsics to use. Developers may manually enumerate possible combinations of these unknown factors, and then pick the best schedule according to measurement results on their device.
+
+### 3.2. AutoTVM-style Design Space Description
+
+Meta schedule extends the schedule DSL with sampling instructions. When included in a schedule, these instructions parametrize the schedule from a single deterministic point to a space supported by random variables (tile size, etc.), making it possible for developers to describe the design space with meta schedule APIs.
+
+We can extend the matmul example above to cover all possible tilings using these sampling instructions:
+
+```python
+# Sample tile sizes
+i_tiles = sch.sample_perfect_tile(i, n=4)
+j_tiles = sch.sample_perfect_tile(j, n=4)
+k_tiles = sch.sample_perfect_tile(k, n=2)
+# Tile the loops according to the random variables
+i_0, i_1, i_2, i_3 = sch.split(loop=i, factors=i_tiles)
+j_0, j_1, j_2, j_3 = sch.split(loop=j, factors=j_tiles)
+k_0, k_1           = sch.split(loop=k, factors=k_tiles)
+# Organize the loops into “SSRSRS” 6-level tiles
+sch.reorder(
+    i_0, j_0, # S
+    i_1, j_1, # S
+    k_0,      # R
+    i_2, j_2, # S
+    k_1,      # R
+    i_3, j_3, # S
+)
+```
+
+### 3.3. Composite Schedule
+
+Each schedule primitive handles only a very basic operation to transform the IR, for example, `split` only splits a loop into two. In the real world, the over-fine granularity of those primitives usually leads to repetitive and verbose scheduling code, as [mentioned](https://discuss.tvm.apache.org/t/rfc-tensorir-a-schedulable-ir-for-tvm/7872/43?u=junrushao1994) by developers in our community.
+
+To counter this challenge, we allow users to register “composite schedules” that analyze the IR, and apply a set of schedule primitives correspondingly. For instance, a composite schedule may inspect a TensorIR block and decide whether we should call `compute_inline` on it. The composite schedule may use sampling instructions to fill in undecided choices.
+
+Our system also ships with some built-in composite schedules, including:
+
+* Multi-level tiling
+* Inline pure spatial blocks
+* Parallelize & vectorize & unroll
+* Auto tensorize
+* …
+
+### 3.4. AutoScheduler-style Design Space Generation
+
+AutoScheduler (Ansor) generates schedule templates by applying their SearchRules to each stage. Meta schedule treats a search rule as a composite schedule, and applies each composite schedule to each block of TensorIR to generate the design space.
+
+### 3.5. Unifying manual schedule / AutoTVM / Ansor
+
+In this section, we show that the design space induced by TE manual schedule, AutoTVM and Ansor are all subsets of meta schedule, and meta schedule further allows mixing those three styles to search jointly.
+
+**Manual schedule**. The TE schedule is a special case of a meta schedule program, where there is no randomness introduced by sampling instructions. It is a single point in terms of design space.
+
+**AutoTVM (Template-based tuning)**. Writing one or more schedule functions in meta schedule, potentially with sampling instructions, is a natural representation of AutoTVM’s schedule templates (knobs). The PPL generates one or more traces as the design space to explore.
+
+**AutoScheduler (Ansor, Template-free tuning)**. As mentioned in the previous section, application of composite schedule rules generates the design space, which is equivalent to Ansor’s sketch generation.
+
+**Mixing styles in design space definition**. By taking union of the spaces induced by the three special cases, our system allows developers to combine generic rules that Ansor provides and operator-specific scheduling.
+
+## 4. Reference-level explanation
+
+In this section, we introduce the underlying techniques for the automation system to extract and explore the design space.
+
+### 4.1. Execution trace as the design space
+
+**Trace**. To represent the design space defined by the meta schedule DSL, the underlying system records all the instructions users applied to the schedule class, including sampling and schedule primitives. We call this list of instructions a trace.
+
+Executing the example above results in the following trace:
+
+```
+Instruction 0. Sample-Perfect-Tile
+Instruction 1. Sample-Perfect-Tile
+Instruction 2. Sample-Perfect-Tile
+Instruction 3. Split
+Instruction 4. Split
+Instruction 5. Split
+Instruction 6. Reorder
+```
+
+The trace is not directly user-facing, but a data structure inside the user-facing `Schedule` class that records the execution. The automation system extracts the trace and finds out the design space according to the sampling instructions.
+
+**Union of traces**. Often a single trace is unable to represent the entire space. Therefore, more precisely, our system works on a list of traces as the union of potential design space.

Review comment:
       Didn't understand this point. Are you saying a design space is composite of many traces? If so, it might be clearer to just name this point "Design Space", and say a trace with sampled values is a design point in the space, where the space is composed of traces with all possible combinations.
   
   In addition, please be careful about the term "design space" and "explored points". Former is an entire space that potentially be explored during the search, regardless whether a point will be visited or not. Latter is a part of the design space that is explored during the search. From the description of "Union of traces" and "Fork a trace", I feel the "design space" you are referring to here is just the explored traces.

##########
File path: rfcs/0001-meta-schedule-autotensorir.md
##########
@@ -0,0 +1,297 @@
+* Feature Name: Meta Schedule (AutoTensorIR)
+* Start Date: 2021-05-28
+* RFC PR: TBD (apache/tvm-rfcs#0000)
+* GitHub Issue: TBD (apache/tvm-rfcs#0000)
+
+## 1. Summary
+
+This proposal introduces Meta Schedule: a probabilistic scheduling DSL on TIR that unifies the approaches of AutoTVM and Auto Scheduler (Ansor). Meta schedule provides a pragmatic way to define the space of automatic tuning, extensibility in terms of all possible TIR schedule primitives like tensorization and loop partitioning, and customizability on every layer of the automation system.
+
+Meta Schedule is our 3rd generation automatic scheduling system.
+
+## 2. Motivation
+
+**Scheduling and Design Space**
+
+In TVM TensorIR, optimization of a TensorIR program is done via a sequence of transformations. For example, we reorder loops for better locality and we tensorize for specific hardware intrinsics. The process of invoking such a set of pre-defined transformations is called “**scheduling**”, and each transformation is called a “**schedule primitive**”. These primitives form a domain-specific language (DSL) describing the transformation of TensorIR programs. **Design space** is the set of all possible schedulings with respect to a TensorIR program.
+
+**Problems with the Current Scheduling System**
+
+* **Manual schedule**: Developers optimize their programs by manually invoking schedule primitives, i.e. explore points in the design space with humans in the loop. This can be a tedious and error-prone approach, hence the creation of AutoTVM and AutoScheduler (Ansor).
+* **AutoTVM**: The automation system requires users to define “schedule templates” as the design space for each operator. Therefore, it is inextensible to hundreds of operators.
+* **AutoScheduler (Ansor)**: It automatically generates schedule templates as the design space, according to a set of predefined “search rules”. However, it is non-trivial to extend AutoScheduler to new schedule primitives (tensorize, loop partition, software pipelining).
+* The three systems above have isolated sets of APIs with several layers of their own abstraction, which are not only hard to learn, but also engineering-intensive to customize.
+
+**Benefit of Meta Schedule**
+
+* Succinct syntax, consistent APIs to TensorIR schedule with no other layer of abstraction.
+* Provides unified APIs for implementing manual schedule, AutoTVM and AutoScheduler (Ansor).
+* Extensibility to all the schedule primitives, including tensorization and loop partitioning. Almost no extra effort is needed to use a new primitive in auto-tuning.
+* The automation infrastructure is customizable across every layer.
+
+## 3. Guide-level explanation
+
+In this section, we describe the syntax of meta schedule DSL, and how it could be used to describe and auto-generate the design space.
+
+### 3.1. Manual Schedule
+
+Meta schedule APIs are almost the same as TE or TensorIR scheduling. Here is an example of a manual schedule for matrix multiplication:
+
+```python
+# Designate a set of tile sizes
+i_tiles = [16, 8, 8, 8]
+j_tiles = [16, 8, 8, 8]
+k_tiles = [256, 8]
+
+# Tile the loops according to the tile sizes
+i_0, i_1, i_2, i_3 = sch.split(loop=i, factors=i_tiles)
+j_0, j_1, j_2, j_3 = sch.split(loop=j, factors=j_tiles)
+k_0, k_1           = sch.split(loop=k, factors=k_tiles)
+
+# Organize the loops into “SSRSRS” 6-level tiles
+sch.reorder(
+    i_0, j_0, # S
+    i_1, j_1, # S
+    k_0,      # R
+    i_2, j_2, # S
+    k_1,      # R
+    i_3, j_3, # S
+)
+```
+
+In this example, the developers may tweak the tile sizes and measure the performance of the generated kernels to explore the opportunities of potential optimization.
+
+Generally speaking, while writing a schedule, there are often some parameters that are hard to determine ahead of time, for example, tile sizes, unroll steps, or which tensor intrinsics to use. Developers may manually enumerate possible combinations of these unknown factors, and then pick the best schedule according to measurement results on their device.
+
+### 3.2. AutoTVM-style Design Space Description
+
+Meta schedule extends the schedule DSL with sampling instructions. When included in a schedule, these instructions parametrize the schedule from a single deterministic point to a space supported by random variables (tile size, etc.), making it possible for developers to describe the design space with meta schedule APIs.
+
+We can extend the matmul example above to cover all possible tilings using these sampling instructions:
+
+```python
+# Sample tile sizes
+i_tiles = sch.sample_perfect_tile(i, n=4)
+j_tiles = sch.sample_perfect_tile(j, n=4)
+k_tiles = sch.sample_perfect_tile(k, n=2)
+# Tile the loops according to the random variables
+i_0, i_1, i_2, i_3 = sch.split(loop=i, factors=i_tiles)
+j_0, j_1, j_2, j_3 = sch.split(loop=j, factors=j_tiles)
+k_0, k_1           = sch.split(loop=k, factors=k_tiles)
+# Organize the loops into “SSRSRS” 6-level tiles
+sch.reorder(
+    i_0, j_0, # S
+    i_1, j_1, # S
+    k_0,      # R
+    i_2, j_2, # S
+    k_1,      # R
+    i_3, j_3, # S
+)
+```
+
+### 3.3. Composite Schedule
+
+Each schedule primitive handles only a very basic operation to transform the IR, for example, `split` only splits a loop into two. In the real world, the over-fine granularity of those primitives usually leads to repetitive and verbose scheduling code, as [mentioned](https://discuss.tvm.apache.org/t/rfc-tensorir-a-schedulable-ir-for-tvm/7872/43?u=junrushao1994) by developers in our community.
+
+To counter this challenge, we allow users to register “composite schedules” that analyze the IR, and apply a set of schedule primitives correspondingly. For instance, a composite schedule may inspect a TensorIR block and decide whether we should call `compute_inline` on it. The composite schedule may use sampling instructions to fill in undecided choices.
+
+Our system also ships with some built-in composite schedules, including:
+
+* Multi-level tiling
+* Inline pure spatial blocks
+* Parallelize & vectorize & unroll
+* Auto tensorize
+* …
+
+### 3.4. AutoScheduler-style Design Space Generation
+
+AutoScheduler (Ansor) generates schedule templates by applying their SearchRules to each stage. Meta schedule treats a search rule as a composite schedule, and applies each composite schedule to each block of TensorIR to generate the design space.
+
+### 3.5. Unifying manual schedule / AutoTVM / Ansor
+
+In this section, we show that the design space induced by TE manual schedule, AutoTVM and Ansor are all subsets of meta schedule, and meta schedule further allows mixing those three styles to search jointly.
+
+**Manual schedule**. The TE schedule is a special case of a meta schedule program, where there is no randomness introduced by sampling instructions. It is a single point in terms of design space.
+
+**AutoTVM (Template-based tuning)**. Writing one or more schedule functions in meta schedule, potentially with sampling instructions, is a natural representation of AutoTVM’s schedule templates (knobs). The PPL generates one or more traces as the design space to explore.
+
+**AutoScheduler (Ansor, Template-free tuning)**. As mentioned in the previous section, application of composite schedule rules generates the design space, which is equivalent to Ansor’s sketch generation.
+
+**Mixing styles in design space definition**. By taking union of the spaces induced by the three special cases, our system allows developers to combine generic rules that Ansor provides and operator-specific scheduling.
+
+## 4. Reference-level explanation

Review comment:
       Might be better to first explain this term.

##########
File path: rfcs/0001-meta-schedule-autotensorir.md
##########
@@ -0,0 +1,297 @@
+* Feature Name: Meta Schedule (AutoTensorIR)
+* Start Date: 2021-05-28
+* RFC PR: TBD (apache/tvm-rfcs#0000)
+* GitHub Issue: TBD (apache/tvm-rfcs#0000)
+
+## 1. Summary
+
+This proposal introduces Meta Schedule: a probabilistic scheduling DSL on TIR that unifies the approaches of AutoTVM and Auto Scheduler (Ansor). Meta schedule provides a pragmatic way to define the space of automatic tuning, extensibility in terms of all possible TIR schedule primitives like tensorization and loop partitioning, and customizability on every layer of the automation system.
+
+Meta Schedule is our 3rd generation automatic scheduling system.
+
+## 2. Motivation
+
+**Scheduling and Design Space**
+
+In TVM TensorIR, optimization of a TensorIR program is done via a sequence of transformations. For example, we reorder loops for better locality and we tensorize for specific hardware intrinsics. The process of invoking such a set of pre-defined transformations is called “**scheduling**”, and each transformation is called a “**schedule primitive**”. These primitives form a domain-specific language (DSL) describing the transformation of TensorIR programs. **Design space** is the set of all possible schedulings with respect to a TensorIR program.
+
+**Problems with the Current Scheduling System**
+
+* **Manual schedule**: Developers optimize their programs by manually invoking schedule primitives, i.e. explore points in the design space with humans in the loop. This can be a tedious and error-prone approach, hence the creation of AutoTVM and AutoScheduler (Ansor).
+* **AutoTVM**: The automation system requires users to define “schedule templates” as the design space for each operator. Therefore, it is inextensible to hundreds of operators.
+* **AutoScheduler (Ansor)**: It automatically generates schedule templates as the design space, according to a set of predefined “search rules”. However, it is non-trivial to extend AutoScheduler to new schedule primitives (tensorize, loop partition, software pipelining).
+* The three systems above have isolated sets of APIs with several layers of their own abstraction, which are not only hard to learn, but also engineering-intensive to customize.
+
+**Benefit of Meta Schedule**
+
+* Succinct syntax, consistent APIs to TensorIR schedule with no other layer of abstraction.
+* Provides unified APIs for implementing manual schedule, AutoTVM and AutoScheduler (Ansor).
+* Extensibility to all the schedule primitives, including tensorization and loop partitioning. Almost no extra effort is needed to use a new primitive in auto-tuning.
+* The automation infrastructure is customizable across every layer.
+
+## 3. Guide-level explanation
+
+In this section, we describe the syntax of meta schedule DSL, and how it could be used to describe and auto-generate the design space.
+
+### 3.1. Manual Schedule
+
+Meta schedule APIs are almost the same as TE or TensorIR scheduling. Here is an example of a manual schedule for matrix multiplication:
+
+```python
+# Designate a set of tile sizes
+i_tiles = [16, 8, 8, 8]
+j_tiles = [16, 8, 8, 8]
+k_tiles = [256, 8]
+
+# Tile the loops according to the tile sizes
+i_0, i_1, i_2, i_3 = sch.split(loop=i, factors=i_tiles)
+j_0, j_1, j_2, j_3 = sch.split(loop=j, factors=j_tiles)
+k_0, k_1           = sch.split(loop=k, factors=k_tiles)
+
+# Organize the loops into “SSRSRS” 6-level tiles
+sch.reorder(
+    i_0, j_0, # S
+    i_1, j_1, # S
+    k_0,      # R
+    i_2, j_2, # S
+    k_1,      # R
+    i_3, j_3, # S
+)
+```
+
+In this example, the developers may tweak the tile sizes and measure the performance of the generated kernels to explore the opportunities of potential optimization.
+
+Generally speaking, while writing a schedule, there are often some parameters that are hard to determine ahead of time, for example, tile sizes, unroll steps, or which tensor intrinsics to use. Developers may manually enumerate possible combinations of these unknown factors, and then pick the best schedule according to measurement results on their device.
+
+### 3.2. AutoTVM-style Design Space Description
+
+Meta schedule extends the schedule DSL with sampling instructions. When included in a schedule, these instructions parametrize the schedule from a single deterministic point to a space supported by random variables (tile size, etc.), making it possible for developers to describe the design space with meta schedule APIs.
+
+We can extend the matmul example above to cover all possible tilings using these sampling instructions:
+
+```python
+# Sample tile sizes
+i_tiles = sch.sample_perfect_tile(i, n=4)
+j_tiles = sch.sample_perfect_tile(j, n=4)
+k_tiles = sch.sample_perfect_tile(k, n=2)
+# Tile the loops according to the random variables
+i_0, i_1, i_2, i_3 = sch.split(loop=i, factors=i_tiles)
+j_0, j_1, j_2, j_3 = sch.split(loop=j, factors=j_tiles)
+k_0, k_1           = sch.split(loop=k, factors=k_tiles)
+# Organize the loops into “SSRSRS” 6-level tiles
+sch.reorder(
+    i_0, j_0, # S
+    i_1, j_1, # S
+    k_0,      # R
+    i_2, j_2, # S
+    k_1,      # R
+    i_3, j_3, # S
+)
+```
+
+### 3.3. Composite Schedule
+
+Each schedule primitive handles only a very basic operation to transform the IR, for example, `split` only splits a loop into two. In the real world, the over-fine granularity of those primitives usually leads to repetitive and verbose scheduling code, as [mentioned](https://discuss.tvm.apache.org/t/rfc-tensorir-a-schedulable-ir-for-tvm/7872/43?u=junrushao1994) by developers in our community.
+
+To counter this challenge, we allow users to register “composite schedules” that analyze the IR, and apply a set of schedule primitives correspondingly. For instance, a composite schedule may inspect a TensorIR block and decide whether we should call `compute_inline` on it. The composite schedule may use sampling instructions to fill in undecided choices.
+
+Our system also ships with some built-in composite schedules, including:
+
+* Multi-level tiling
+* Inline pure spatial blocks
+* Parallelize & vectorize & unroll
+* Auto tensorize
+* …
+
+### 3.4. AutoScheduler-style Design Space Generation
+
+AutoScheduler (Ansor) generates schedule templates by applying their SearchRules to each stage. Meta schedule treats a search rule as a composite schedule, and applies each composite schedule to each block of TensorIR to generate the design space.
+
+### 3.5. Unifying manual schedule / AutoTVM / Ansor
+
+In this section, we show that the design space induced by TE manual schedule, AutoTVM and Ansor are all subsets of meta schedule, and meta schedule further allows mixing those three styles to search jointly.
+
+**Manual schedule**. The TE schedule is a special case of a meta schedule program, where there is no randomness introduced by sampling instructions. It is a single point in terms of design space.
+
+**AutoTVM (Template-based tuning)**. Writing one or more schedule functions in meta schedule, potentially with sampling instructions, is a natural representation of AutoTVM’s schedule templates (knobs). The PPL generates one or more traces as the design space to explore.
+
+**AutoScheduler (Ansor, Template-free tuning)**. As mentioned in the previous section, application of composite schedule rules generates the design space, which is equivalent to Ansor’s sketch generation.
+
+**Mixing styles in design space definition**. By taking union of the spaces induced by the three special cases, our system allows developers to combine generic rules that Ansor provides and operator-specific scheduling.
+
+## 4. Reference-level explanation
+
+In this section, we introduce the underlying techniques for the automation system to extract and explore the design space.
+
+### 4.1. Execution trace as the design space
+
+**Trace**. To represent the design space defined by the meta schedule DSL, the underlying system records all the instructions users applied to the schedule class, including sampling and schedule primitives. We call this list of instructions a trace.
+
+Executing the example above results in the following trace:
+
+```
+Instruction 0. Sample-Perfect-Tile
+Instruction 1. Sample-Perfect-Tile
+Instruction 2. Sample-Perfect-Tile
+Instruction 3. Split
+Instruction 4. Split
+Instruction 5. Split
+Instruction 6. Reorder
+```
+
+The trace is not directly user-facing, but a data structure inside the user-facing `Schedule` class that records the execution. The automation system extracts the trace and finds out the design space according to the sampling instructions.
+
+**Union of traces**. Often a single trace is unable to represent the entire space. Therefore, more precisely, our system works on a list of traces as the union of potential design space.
+
+**Fork a trace**. When two different decisions in the scheduling process are equally important to generate high-performance schedules, we allow forking the trace into two, and the design space is the union of the forked traces.
+
+### 4.2. Exploring the Search Space
+
+Meta Schedule provides several built-in exploration strategies to exhaustively or efficiently search for efficient schedules.
+
+**Program replay**. A simple strategy that replays the schedule program that generates the PPL, and doesn’t use any advantage provided by the PPL.
+
+**Random search**. Extracts the PPL, and repetitively re-executes the PPL by flipping coins purely randomly.
+
+**Cost-model-guided evolutionary search**. A more efficient exploration strategy. We define two sets of rules:
+
+* Mutator: defines how to jump to a point’s “neighbor” in the design space
+* Postprocessor: sometimes it is non-trivial to statically determine the PPL, for example:
+  * There is a hard requirement in CUDA that the maximum number of threads should not exceed 1024, but it is a random variable that cannot be determined before actually executing the PPL. In this case, we write a postprocessor that errors out when the condition is not satisfied.

Review comment:
       This might be improved in the future. I can imagine that the failure rate of resource limited devices would be very high, and the tuning time will be spent mostly on generating a valid program.

##########
File path: rfcs/0001-meta-schedule-autotensorir.md
##########
@@ -0,0 +1,297 @@
+* Feature Name: Meta Schedule (AutoTensorIR)
+* Start Date: 2021-05-28
+* RFC PR: TBD (apache/tvm-rfcs#0000)
+* GitHub Issue: TBD (apache/tvm-rfcs#0000)
+
+## 1. Summary
+
+This proposal introduces Meta Schedule: a probabilistic scheduling DSL on TIR that unifies the approaches of AutoTVM and Auto Scheduler (Ansor). Meta schedule provides a pragmatic way to define the space of automatic tuning, extensibility in terms of all possible TIR schedule primitives like tensorization and loop partitioning, and customizability on every layer of the automation system.
+
+Meta Schedule is our 3rd generation automatic scheduling system.
+
+## 2. Motivation
+
+**Scheduling and Design Space**
+
+In TVM TensorIR, optimization of a TensorIR program is done via a sequence of transformations. For example, we reorder loops for better locality and we tensorize for specific hardware intrinsics. The process of invoking such a set of pre-defined transformations is called “**scheduling**”, and each transformation is called a “**schedule primitive**”. These primitives form a domain-specific language (DSL) describing the transformation of TensorIR programs. **Design space** is the set of all possible schedulings with respect to a TensorIR program.
+
+**Problems with the Current Scheduling System**
+
+* **Manual schedule**: Developers optimize their programs by manually invoking schedule primitives, i.e. explore points in the design space with humans in the loop. This can be a tedious and error-prone approach, hence the creation of AutoTVM and AutoScheduler (Ansor).
+* **AutoTVM**: The automation system requires users to define “schedule templates” as the design space for each operator. Therefore, it is inextensible to hundreds of operators.
+* **AutoScheduler (Ansor)**: It automatically generates schedule templates as the design space, according to a set of predefined “search rules”. However, it is non-trivial to extend AutoScheduler to new schedule primitives (tensorize, loop partition, software pipelining).
+* The three systems above have isolated sets of APIs with several layers of their own abstraction, which are not only hard to learn, but also engineering-intensive to customize.
+
+**Benefit of Meta Schedule**
+
+* Succinct syntax, consistent APIs to TensorIR schedule with no other layer of abstraction.
+* Provides unified APIs for implementing manual schedule, AutoTVM and AutoScheduler (Ansor).
+* Extensibility to all the schedule primitives, including tensorization and loop partitioning. Almost no extra effort is needed to use a new primitive in auto-tuning.
+* The automation infrastructure is customizable across every layer.
+
+## 3. Guide-level explanation
+
+In this section, we describe the syntax of meta schedule DSL, and how it could be used to describe and auto-generate the design space.
+
+### 3.1. Manual Schedule
+
+Meta schedule APIs are almost the same as TE or TensorIR scheduling. Here is an example of a manual schedule for matrix multiplication:
+
+```python
+# Designate a set of tile sizes
+i_tiles = [16, 8, 8, 8]
+j_tiles = [16, 8, 8, 8]
+k_tiles = [256, 8]
+
+# Tile the loops according to the tile sizes
+i_0, i_1, i_2, i_3 = sch.split(loop=i, factors=i_tiles)
+j_0, j_1, j_2, j_3 = sch.split(loop=j, factors=j_tiles)
+k_0, k_1           = sch.split(loop=k, factors=k_tiles)
+
+# Organize the loops into “SSRSRS” 6-level tiles
+sch.reorder(
+    i_0, j_0, # S
+    i_1, j_1, # S
+    k_0,      # R
+    i_2, j_2, # S
+    k_1,      # R
+    i_3, j_3, # S
+)
+```
+
+In this example, the developers may tweak the tile sizes and measure the performance of the generated kernels to explore the opportunities of potential optimization.
+
+Generally speaking, while writing a schedule, there are often some parameters that are hard to determine ahead of time, for example, tile sizes, unroll steps, or which tensor intrinsics to use. Developers may manually enumerate possible combinations of these unknown factors, and then pick the best schedule according to measurement results on their device.
+
+### 3.2. AutoTVM-style Design Space Description
+
+Meta schedule extends the schedule DSL with sampling instructions. When included in a schedule, these instructions parametrize the schedule from a single deterministic point to a space supported by random variables (tile size, etc.), making it possible for developers to describe the design space with meta schedule APIs.
+
+We can extend the matmul example above to cover all possible tilings using these sampling instructions:
+
+```python
+# Sample tile sizes
+i_tiles = sch.sample_perfect_tile(i, n=4)
+j_tiles = sch.sample_perfect_tile(j, n=4)
+k_tiles = sch.sample_perfect_tile(k, n=2)
+# Tile the loops according to the random variables
+i_0, i_1, i_2, i_3 = sch.split(loop=i, factors=i_tiles)
+j_0, j_1, j_2, j_3 = sch.split(loop=j, factors=j_tiles)
+k_0, k_1           = sch.split(loop=k, factors=k_tiles)
+# Organize the loops into “SSRSRS” 6-level tiles
+sch.reorder(
+    i_0, j_0, # S
+    i_1, j_1, # S
+    k_0,      # R
+    i_2, j_2, # S
+    k_1,      # R
+    i_3, j_3, # S
+)
+```
+
+### 3.3. Composite Schedule
+
+Each schedule primitive handles only a very basic operation to transform the IR, for example, `split` only splits a loop into two. In the real world, the over-fine granularity of those primitives usually leads to repetitive and verbose scheduling code, as [mentioned](https://discuss.tvm.apache.org/t/rfc-tensorir-a-schedulable-ir-for-tvm/7872/43?u=junrushao1994) by developers in our community.
+
+To counter this challenge, we allow users to register “composite schedules” that analyze the IR, and apply a set of schedule primitives correspondingly. For instance, a composite schedule may inspect a TensorIR block and decide whether we should call `compute_inline` on it. The composite schedule may use sampling instructions to fill in undecided choices.
+
+Our system also ships with some built-in composite schedules, including:
+
+* Multi-level tiling
+* Inline pure spatial blocks
+* Parallelize & vectorize & unroll
+* Auto tensorize
+* …
+
+### 3.4. AutoScheduler-style Design Space Generation
+
+AutoScheduler (Ansor) generates schedule templates by applying their SearchRules to each stage. Meta schedule treats a search rule as a composite schedule, and applies each composite schedule to each block of TensorIR to generate the design space.
+
+### 3.5. Unifying manual schedule / AutoTVM / Ansor
+
+In this section, we show that the design space induced by TE manual schedule, AutoTVM and Ansor are all subsets of meta schedule, and meta schedule further allows mixing those three styles to search jointly.
+
+**Manual schedule**. The TE schedule is a special case of a meta schedule program, where there is no randomness introduced by sampling instructions. It is a single point in terms of design space.
+
+**AutoTVM (Template-based tuning)**. Writing one or more schedule functions in meta schedule, potentially with sampling instructions, is a natural representation of AutoTVM’s schedule templates (knobs). The PPL generates one or more traces as the design space to explore.
+
+**AutoScheduler (Ansor, Template-free tuning)**. As mentioned in the previous section, application of composite schedule rules generates the design space, which is equivalent to Ansor’s sketch generation.
+
+**Mixing styles in design space definition**. By taking union of the spaces induced by the three special cases, our system allows developers to combine generic rules that Ansor provides and operator-specific scheduling.
+
+## 4. Reference-level explanation
+
+In this section, we introduce the underlying techniques for the automation system to extract and explore the design space.
+
+### 4.1. Execution trace as the design space
+
+**Trace**. To represent the design space defined by the meta schedule DSL, the underlying system records all the instructions users applied to the schedule class, including sampling and schedule primitives. We call this list of instructions a trace.
+
+Executing the example above results in the following trace:
+
+```
+Instruction 0. Sample-Perfect-Tile
+Instruction 1. Sample-Perfect-Tile
+Instruction 2. Sample-Perfect-Tile
+Instruction 3. Split
+Instruction 4. Split
+Instruction 5. Split
+Instruction 6. Reorder
+```
+
+The trace is not directly user-facing, but a data structure inside the user-facing `Schedule` class that records the execution. The automation system extracts the trace and finds out the design space according to the sampling instructions.
+
+**Union of traces**. Often a single trace is unable to represent the entire space. Therefore, more precisely, our system works on a list of traces as the union of potential design space.
+
+**Fork a trace**. When two different decisions in the scheduling process are equally important to generate high-performance schedules, we allow forking the trace into two, and the design space is the union of the forked traces.
+
+### 4.2. Exploring the Search Space
+
+Meta Schedule provides several built-in exploration strategies to exhaustively or efficiently search for efficient schedules.
+
+**Program replay**. A simple strategy that replays the schedule program that generates the PPL, and doesn’t use any advantage provided by the PPL.
+
+**Random search**. Extracts the PPL, and repetitively re-executes the PPL by flipping coins purely randomly.
+
+**Cost-model-guided evolutionary search**. A more efficient exploration strategy. We define two sets of rules:
+
+* Mutator: defines how to jump to a point’s “neighbor” in the design space
+* Postprocessor: sometimes it is non-trivial to statically determine the PPL, for example:
+  * There is a hard requirement in CUDA that the maximum number of threads should not exceed 1024, but it is a random variable that cannot be determined before actually executing the PPL. In this case, we write a postprocessor that errors out when the condition is not satisfied.
+  * The number of outer loops to be fused together depends on their extents, which are random variables. In this case, we annotate the maximum extent allowed on the block, and do actual fusion in a postprocessor.
+
+Our evolutionary search algorithm uses mutators to find possible schedules in the design space, then applies postprocessors, and asks the cost model to predict its performance. After several iterations, the new schedules with the highest scores are finally compiled and measured on device. Epsilon-greedy is used in this process to balance exploitation and exploration.
+
+### 4.3. Python first for flexibility & customizability
+
+We engineer the system in a way that all levels are decoupled and open to customization, aiming at providing a playground for developers to try out new ideas and potentially deliver performance quickly.

Review comment:
       ```suggestion
   We implement the system in a way that all levels are decoupled and open to customization, aiming at providing a playground for developers to try out new ideas and potentially deliver performance quickly.
   ```

##########
File path: rfcs/0001-meta-schedule-autotensorir.md
##########
@@ -0,0 +1,297 @@
+* Feature Name: Meta Schedule (AutoTensorIR)
+* Start Date: 2021-05-28
+* RFC PR: TBD (apache/tvm-rfcs#0000)
+* GitHub Issue: TBD (apache/tvm-rfcs#0000)
+
+## 1. Summary
+
+This proposal introduces Meta Schedule: a probabilistic scheduling DSL on TIR that unifies the approaches of AutoTVM and Auto Scheduler (Ansor). Meta schedule provides a pragmatic way to define the space of automatic tuning, extensibility in terms of all possible TIR schedule primitives like tensorization and loop partitioning, and customizability on every layer of the automation system.
+
+Meta Schedule is our 3rd generation automatic scheduling system.
+
+## 2. Motivation
+
+**Scheduling and Design Space**
+
+In TVM TensorIR, optimization of a TensorIR program is done via a sequence of transformations. For example, we reorder loops for better locality and we tensorize for specific hardware intrinsics. The process of invoking such a set of pre-defined transformations is called “**scheduling**”, and each transformation is called a “**schedule primitive**”. These primitives form a domain-specific language (DSL) describing the transformation of TensorIR programs. **Design space** is the set of all possible schedulings with respect to a TensorIR program.
+
+**Problems with the Current Scheduling System**
+
+* **Manual schedule**: Developers optimize their programs by manually invoking schedule primitives, i.e. explore points in the design space with humans in the loop. This can be a tedious and error-prone approach, hence the creation of AutoTVM and AutoScheduler (Ansor).
+* **AutoTVM**: The automation system requires users to define “schedule templates” as the design space for each operator. Therefore, it is inextensible to hundreds of operators.
+* **AutoScheduler (Ansor)**: It automatically generates schedule templates as the design space, according to a set of predefined “search rules”. However, it is non-trivial to extend AutoScheduler to new schedule primitives (tensorize, loop partition, software pipelining).
+* The three systems above have isolated sets of APIs with several layers of their own abstraction, which are not only hard to learn, but also engineering-intensive to customize.
+
+**Benefit of Meta Schedule**
+
+* Succinct syntax, consistent APIs to TensorIR schedule with no other layer of abstraction.
+* Provides unified APIs for implementing manual schedule, AutoTVM and AutoScheduler (Ansor).
+* Extensibility to all the schedule primitives, including tensorization and loop partitioning. Almost no extra effort is needed to use a new primitive in auto-tuning.
+* The automation infrastructure is customizable across every layer.
+
+## 3. Guide-level explanation
+
+In this section, we describe the syntax of meta schedule DSL, and how it could be used to describe and auto-generate the design space.
+
+### 3.1. Manual Schedule
+
+Meta schedule APIs are almost the same as TE or TensorIR scheduling. Here is an example of a manual schedule for matrix multiplication:
+
+```python
+# Designate a set of tile sizes
+i_tiles = [16, 8, 8, 8]
+j_tiles = [16, 8, 8, 8]
+k_tiles = [256, 8]
+
+# Tile the loops according to the tile sizes
+i_0, i_1, i_2, i_3 = sch.split(loop=i, factors=i_tiles)
+j_0, j_1, j_2, j_3 = sch.split(loop=j, factors=j_tiles)
+k_0, k_1           = sch.split(loop=k, factors=k_tiles)
+
+# Organize the loops into “SSRSRS” 6-level tiles
+sch.reorder(
+    i_0, j_0, # S
+    i_1, j_1, # S
+    k_0,      # R
+    i_2, j_2, # S
+    k_1,      # R
+    i_3, j_3, # S
+)
+```
+
+In this example, the developers may tweak the tile sizes and measure the performance of the generated kernels to explore the opportunities of potential optimization.
+
+Generally speaking, while writing a schedule, there are often some parameters that are hard to determine ahead of time, for example, tile sizes, unroll steps, or which tensor intrinsics to use. Developers may manually enumerate possible combinations of these unknown factors, and then pick the best schedule according to measurement results on their device.
+
+### 3.2. AutoTVM-style Design Space Description
+
+Meta schedule extends the schedule DSL with sampling instructions. When included in a schedule, these instructions parametrize the schedule from a single deterministic point to a space supported by random variables (tile size, etc.), making it possible for developers to describe the design space with meta schedule APIs.
+
+We can extend the matmul example above to cover all possible tilings using these sampling instructions:
+
+```python
+# Sample tile sizes
+i_tiles = sch.sample_perfect_tile(i, n=4)
+j_tiles = sch.sample_perfect_tile(j, n=4)
+k_tiles = sch.sample_perfect_tile(k, n=2)
+# Tile the loops according to the random variables
+i_0, i_1, i_2, i_3 = sch.split(loop=i, factors=i_tiles)
+j_0, j_1, j_2, j_3 = sch.split(loop=j, factors=j_tiles)
+k_0, k_1           = sch.split(loop=k, factors=k_tiles)
+# Organize the loops into “SSRSRS” 6-level tiles
+sch.reorder(
+    i_0, j_0, # S
+    i_1, j_1, # S
+    k_0,      # R
+    i_2, j_2, # S
+    k_1,      # R
+    i_3, j_3, # S
+)
+```
+
+### 3.3. Composite Schedule
+
+Each schedule primitive handles only a very basic operation to transform the IR, for example, `split` only splits a loop into two. In the real world, the over-fine granularity of those primitives usually leads to repetitive and verbose scheduling code, as [mentioned](https://discuss.tvm.apache.org/t/rfc-tensorir-a-schedulable-ir-for-tvm/7872/43?u=junrushao1994) by developers in our community.
+
+To counter this challenge, we allow users to register “composite schedules” that analyze the IR, and apply a set of schedule primitives correspondingly. For instance, a composite schedule may inspect a TensorIR block and decide whether we should call `compute_inline` on it. The composite schedule may use sampling instructions to fill in undecided choices.
+
+Our system also ships with some built-in composite schedules, including:
+
+* Multi-level tiling
+* Inline pure spatial blocks
+* Parallelize & vectorize & unroll
+* Auto tensorize
+* …
+
+### 3.4. AutoScheduler-style Design Space Generation
+
+AutoScheduler (Ansor) generates schedule templates by applying their SearchRules to each stage. Meta schedule treats a search rule as a composite schedule, and applies each composite schedule to each block of TensorIR to generate the design space.
+
+### 3.5. Unifying manual schedule / AutoTVM / Ansor
+
+In this section, we show that the design space induced by TE manual schedule, AutoTVM and Ansor are all subsets of meta schedule, and meta schedule further allows mixing those three styles to search jointly.
+
+**Manual schedule**. The TE schedule is a special case of a meta schedule program, where there is no randomness introduced by sampling instructions. It is a single point in terms of design space.
+
+**AutoTVM (Template-based tuning)**. Writing one or more schedule functions in meta schedule, potentially with sampling instructions, is a natural representation of AutoTVM’s schedule templates (knobs). The PPL generates one or more traces as the design space to explore.
+
+**AutoScheduler (Ansor, Template-free tuning)**. As mentioned in the previous section, application of composite schedule rules generates the design space, which is equivalent to Ansor’s sketch generation.
+
+**Mixing styles in design space definition**. By taking union of the spaces induced by the three special cases, our system allows developers to combine generic rules that Ansor provides and operator-specific scheduling.
+
+## 4. Reference-level explanation
+
+In this section, we introduce the underlying techniques for the automation system to extract and explore the design space.
+
+### 4.1. Execution trace as the design space
+
+**Trace**. To represent the design space defined by the meta schedule DSL, the underlying system records all the instructions users applied to the schedule class, including sampling and schedule primitives. We call this list of instructions a trace.
+
+Executing the example above results in the following trace:
+
+```
+Instruction 0. Sample-Perfect-Tile
+Instruction 1. Sample-Perfect-Tile
+Instruction 2. Sample-Perfect-Tile
+Instruction 3. Split
+Instruction 4. Split
+Instruction 5. Split
+Instruction 6. Reorder
+```
+
+The trace is not directly user-facing, but a data structure inside the user-facing `Schedule` class that records the execution. The automation system extracts the trace and finds out the design space according to the sampling instructions.
+
+**Union of traces**. Often a single trace is unable to represent the entire space. Therefore, more precisely, our system works on a list of traces as the union of potential design space.
+
+**Fork a trace**. When two different decisions in the scheduling process are equally important to generate high-performance schedules, we allow forking the trace into two, and the design space is the union of the forked traces.
+
+### 4.2. Exploring the Search Space
+
+Meta Schedule provides several built-in exploration strategies to exhaustively or efficiently search for efficient schedules.
+
+**Program replay**. A simple strategy that replays the schedule program that generates the PPL, and doesn’t use any advantage provided by the PPL.

Review comment:
       This seems not a "search" approach but just the way to apply a trace? Also "PPL" is first used here so better to provide the full name.

##########
File path: rfcs/0001-meta-schedule-autotensorir.md
##########
@@ -0,0 +1,297 @@
+* Feature Name: Meta Schedule (AutoTensorIR)
+* Start Date: 2021-05-28
+* RFC PR: TBD (apache/tvm-rfcs#0000)
+* GitHub Issue: TBD (apache/tvm-rfcs#0000)
+
+## 1. Summary
+
+This proposal introduces Meta Schedule: a probabilistic scheduling DSL on TIR that unifies the approaches of AutoTVM and Auto Scheduler (Ansor). Meta schedule provides a pragmatic way to define the space of automatic tuning, extensibility in terms of all possible TIR schedule primitives like tensorization and loop partitioning, and customizability on every layer of the automation system.
+
+Meta Schedule is our 3rd generation automatic scheduling system.
+
+## 2. Motivation
+
+**Scheduling and Design Space**
+
+In TVM TensorIR, optimization of a TensorIR program is done via a sequence of transformations. For example, we reorder loops for better locality and we tensorize for specific hardware intrinsics. The process of invoking such a set of pre-defined transformations is called “**scheduling**”, and each transformation is called a “**schedule primitive**”. These primitives form a domain-specific language (DSL) describing the transformation of TensorIR programs. **Design space** is the set of all possible schedulings with respect to a TensorIR program.
+
+**Problems with the Current Scheduling System**
+
+* **Manual schedule**: Developers optimize their programs by manually invoking schedule primitives, i.e. explore points in the design space with humans in the loop. This can be a tedious and error-prone approach, hence the creation of AutoTVM and AutoScheduler (Ansor).
+* **AutoTVM**: The automation system requires users to define “schedule templates” as the design space for each operator. Therefore, it is inextensible to hundreds of operators.
+* **AutoScheduler (Ansor)**: It automatically generates schedule templates as the design space, according to a set of predefined “search rules”. However, it is non-trivial to extend AutoScheduler to new schedule primitives (tensorize, loop partition, software pipelining).
+* The three systems above have isolated sets of APIs with several layers of their own abstraction, which are not only hard to learn, but also engineering-intensive to customize.
+
+**Benefit of Meta Schedule**
+
+* Succinct syntax, consistent APIs to TensorIR schedule with no other layer of abstraction.
+* Provides unified APIs for implementing manual schedule, AutoTVM and AutoScheduler (Ansor).
+* Extensibility to all the schedule primitives, including tensorization and loop partitioning. Almost no extra effort is needed to use a new primitive in auto-tuning.
+* The automation infrastructure is customizable across every layer.
+
+## 3. Guide-level explanation
+
+In this section, we describe the syntax of meta schedule DSL, and how it could be used to describe and auto-generate the design space.
+
+### 3.1. Manual Schedule
+
+Meta schedule APIs are almost the same as TE or TensorIR scheduling. Here is an example of a manual schedule for matrix multiplication:
+
+```python
+# Designate a set of tile sizes
+i_tiles = [16, 8, 8, 8]
+j_tiles = [16, 8, 8, 8]
+k_tiles = [256, 8]
+
+# Tile the loops according to the tile sizes
+i_0, i_1, i_2, i_3 = sch.split(loop=i, factors=i_tiles)
+j_0, j_1, j_2, j_3 = sch.split(loop=j, factors=j_tiles)
+k_0, k_1           = sch.split(loop=k, factors=k_tiles)
+
+# Organize the loops into “SSRSRS” 6-level tiles
+sch.reorder(
+    i_0, j_0, # S
+    i_1, j_1, # S
+    k_0,      # R
+    i_2, j_2, # S
+    k_1,      # R
+    i_3, j_3, # S
+)
+```
+
+In this example, the developers may tweak the tile sizes and measure the performance of the generated kernels to explore the opportunities of potential optimization.
+
+Generally speaking, while writing a schedule, there are often some parameters that are hard to determine ahead of time, for example, tile sizes, unroll steps, or which tensor intrinsics to use. Developers may manually enumerate possible combinations of these unknown factors, and then pick the best schedule according to measurement results on their device.
+
+### 3.2. AutoTVM-style Design Space Description
+
+Meta schedule extends the schedule DSL with sampling instructions. When included in a schedule, these instructions parametrize the schedule from a single deterministic point to a space supported by random variables (tile size, etc.), making it possible for developers to describe the design space with meta schedule APIs.
+
+We can extend the matmul example above to cover all possible tilings using these sampling instructions:
+
+```python
+# Sample tile sizes
+i_tiles = sch.sample_perfect_tile(i, n=4)
+j_tiles = sch.sample_perfect_tile(j, n=4)
+k_tiles = sch.sample_perfect_tile(k, n=2)
+# Tile the loops according to the random variables
+i_0, i_1, i_2, i_3 = sch.split(loop=i, factors=i_tiles)
+j_0, j_1, j_2, j_3 = sch.split(loop=j, factors=j_tiles)
+k_0, k_1           = sch.split(loop=k, factors=k_tiles)
+# Organize the loops into “SSRSRS” 6-level tiles
+sch.reorder(
+    i_0, j_0, # S
+    i_1, j_1, # S
+    k_0,      # R
+    i_2, j_2, # S
+    k_1,      # R
+    i_3, j_3, # S
+)
+```
+
+### 3.3. Composite Schedule
+
+Each schedule primitive handles only a very basic operation to transform the IR, for example, `split` only splits a loop into two. In the real world, the over-fine granularity of those primitives usually leads to repetitive and verbose scheduling code, as [mentioned](https://discuss.tvm.apache.org/t/rfc-tensorir-a-schedulable-ir-for-tvm/7872/43?u=junrushao1994) by developers in our community.
+
+To counter this challenge, we allow users to register “composite schedules” that analyze the IR, and apply a set of schedule primitives correspondingly. For instance, a composite schedule may inspect a TensorIR block and decide whether we should call `compute_inline` on it. The composite schedule may use sampling instructions to fill in undecided choices.
+
+Our system also ships with some built-in composite schedules, including:
+
+* Multi-level tiling
+* Inline pure spatial blocks
+* Parallelize & vectorize & unroll
+* Auto tensorize
+* …
+
+### 3.4. AutoScheduler-style Design Space Generation
+
+AutoScheduler (Ansor) generates schedule templates by applying their SearchRules to each stage. Meta schedule treats a search rule as a composite schedule, and applies each composite schedule to each block of TensorIR to generate the design space.
+
+### 3.5. Unifying manual schedule / AutoTVM / Ansor
+
+In this section, we show that the design space induced by TE manual schedule, AutoTVM and Ansor are all subsets of meta schedule, and meta schedule further allows mixing those three styles to search jointly.
+
+**Manual schedule**. The TE schedule is a special case of a meta schedule program, where there is no randomness introduced by sampling instructions. It is a single point in terms of design space.
+
+**AutoTVM (Template-based tuning)**. Writing one or more schedule functions in meta schedule, potentially with sampling instructions, is a natural representation of AutoTVM’s schedule templates (knobs). The PPL generates one or more traces as the design space to explore.
+
+**AutoScheduler (Ansor, Template-free tuning)**. As mentioned in the previous section, application of composite schedule rules generates the design space, which is equivalent to Ansor’s sketch generation.

Review comment:
       These are more like a summary of previous subsections. It would be more exciting and clear if this subsection can just provide an example that mixes 3 styles.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org