You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@tvm.apache.org by GitBox <gi...@apache.org> on 2022/04/08 20:19:02 UTC
[GitHub] [tvm-rfcs] areusch commented on a diff in pull request #63: RFC: clarifying buffer declaration and access

areusch commented on code in PR #63:
URL: https://github.com/apache/tvm-rfcs/pull/63#discussion_r846438607


##########
rfcs/0063-clarifying-buffer-declaration-and-access.md:
##########
@@ -0,0 +1,232 @@
+- Feature Name: Clarifying Buffer Declaration and Access
+- Author: Wuwei Lin (@vinx13), Eric Lunderberg (@Lunderberg)
+- Start Date: 2022-03-18
+- RFC PR: [apache/tvm-rfcs#63](https://github.com/apache/tvm-rfcs/pull/63)
+- GitHub Issue: [apache/tvm#10505](https://github.com/apache/tvm/issues/10505)
+
+# Summary 
+[summary]: #summary
+
+In https://github.com/apache/tvm/pull/9727 and
+[RFC#39](https://github.com/apache/tvm-rfcs/blob/main/rfcs/0039-buffer-physical-layout.md), we
+deprecated `Load` and `Store` to use `BufferLoad` and `BufferStore` instead in order to support
+generalized multi-dimensional physical buffer access. Here we document necessary clarifications,
+implications about the new buffer convention, as well as the post-hoc pass checklist.
+
+# Motivation
+[motivation]: #motivation
+
+The main goal of this RFC is to summarize the existing buffer convention and the IR changes in
+https://github.com/apache/tvm/pull/9727 which have a broader impact. There are no new semantics
+proposed in this RFC.
+
+# Reference - level explanation
+[reference-level-explanation]: #reference-level-explanation
+
+**What’s a buffer?**
+
+Buffer is a compile-time object that groups runtime objects(data pointer and shape variables)
+structurally. A Buffer needs to be declared and allocated before it can be used.
+
+**Declaration of buffer**
+
+Buffer can be declared in the following ways:
+
+- Buffer map of `PrimFunc`. This specifies how opaque parameters of type `T.handle` should be

Review Comment:
   ```suggestion
   - Inside the `buffer_map` of `PrimFunc`. TIR's type system does not accommodate rich array types, instead representing them as `T.handle` (typically emitted as `void*`). The `buffer_map` specifies how to interpret such `T.handle` when using it as a basis for array accesses.
   ```



##########
rfcs/0063-clarifying-buffer-declaration-and-access.md:
##########
@@ -0,0 +1,232 @@
+- Feature Name: Clarifying Buffer Declaration and Access
+- Author: Wuwei Lin (@vinx13), Eric Lunderberg (@Lunderberg)
+- Start Date: 2022-03-18
+- RFC PR: [apache/tvm-rfcs#63](https://github.com/apache/tvm-rfcs/pull/63)
+- GitHub Issue: [apache/tvm#10505](https://github.com/apache/tvm/issues/10505)
+
+# Summary 
+[summary]: #summary
+
+In https://github.com/apache/tvm/pull/9727 and
+[RFC#39](https://github.com/apache/tvm-rfcs/blob/main/rfcs/0039-buffer-physical-layout.md), we
+deprecated `Load` and `Store` to use `BufferLoad` and `BufferStore` instead in order to support
+generalized multi-dimensional physical buffer access. Here we document necessary clarifications,
+implications about the new buffer convention, as well as the post-hoc pass checklist.
+
+# Motivation
+[motivation]: #motivation
+
+The main goal of this RFC is to summarize the existing buffer convention and the IR changes in
+https://github.com/apache/tvm/pull/9727 which have a broader impact. There are no new semantics
+proposed in this RFC.
+
+# Reference - level explanation
+[reference-level-explanation]: #reference-level-explanation
+
+**What’s a buffer?**
+
+Buffer is a compile-time object that groups runtime objects(data pointer and shape variables)
+structurally. A Buffer needs to be declared and allocated before it can be used.
+
+**Declaration of buffer**
+
+Buffer can be declared in the following ways:
+
+- Buffer map of `PrimFunc`. This specifies how opaque parameters of type `T.handle` should be
+interpreted as a buffer inside the `PrimFunc`. `T.handle` represents an opaque pointer (`void *`).
+- `T.alloc_buffer` is used `S-TIR` to create and allocate a buffer.
+- `T.buffer_decl` can be used to create a buffer alias by specifying the underlying data variable to
+reuse the data from another buffer. It can also be used to reinterpret the data type of the buffer.
+`T.buffer_decl` can also be used to create a buffer alias with a different `elem_offset`.
+`elem_offset` should be handled during the lowering process.
+
+Examples of `T.buffer_decl` is shown below.
+```
+@T.prim_func
+def buffer_alias(A: T.Buffer[(16,), "float"]):
+    A_vector = T.buffer_decl([4], "float32x4", data=A.data)
+
+@T.prim_func
+def buffer_alloc():
+    A = T.buffer_decl([4, 4], "float32")
+    Allocate(A.data, [16], "float32")
+```
+
+**Allocation of buffer**
+
+In low-level TIR, `Allocate` is used to allocate a data variable with given shapes. `Allocate`
+doesn’t operate on the buffer-level. The result of `Allocate` is a data variable, which may be
+reinterpreted with a different shape or data type in buffer declaration.
+
+**Explicit `DeclBuffer` IR construct**
+
+`T.buffer_decl` is not an explicit statement in TIR - There is no such node in TIR.
+The current behavior of `TVMScriptPrinter` is to implicitly print a `T.buffer_decl` at the beginning
+of `PrimFunc` for any undefined buffers. The implicit behavior can be error-prone. In light of the
+migration, we should consider an explicit `DeclBuffer` as part of the IR. This will be furthur
+discussed in a separate RFC.
+
+**Buffer Aliasing**
+
+`T.buffer_decl` creates a buffer alias if the underlying data variable (`.data` field) overlaps with
+another buffer. Buffer created via `alloc_buffer` always do not alias. Buffer aliases do not need
+`Allocate` to create the data variable. Each data variable can be allocated via `Allocate` once.
+If a transformation would produce multiple allocations of the same buffer var (e.g. unrolling a loop
+that contains an allocation), the transform should update the allocations to be unique using
+`tvm::tir::ConvertSSA`
+
+Buffers should not alias each other unless necessary, because aliased buffers increase complexity
+for TIR transformations.  Passes that rewrite buffers should clearly indicate how aliased buffers
+are handled.  For example, when changing the underlying layout of stored elements in a buffer, all
+buffer aliases must also be updated.  While buffer aliasing is typically free at runtime, this
+imposes a cost for buffer aliasing both to compile times and development complexity. 
+
+**Discussion: When it is safe to transform a buffer**
+
+We would like to discuss some examples of when it is safe to transform a buffer w.r.t. aliasing rules:
+
+(1) reshape, (2) layout transform (e.g. swap indices), (3) compact. 
+
+(1) is fine under aliasing as long as the low level memory is shared. (2) and (3) would need more
+cares. (2) requires all the aliases be changed together. (3) requires to compute the compact buffer
+shape and then rewrite the buffer shape. This need us to take all alias into consideration and then
+rewrite their shapes together.
+
+**Generalizing buffer accesses**
+
+Previously we used `Load` and `Store` to represent low-level buffer accesses. `Load` and `Store`
+consist of data variable, data type and index, which can be directly translated to pointer cast and
+accesses in runtime. Note that data type in `Load` / `Store` can be different from the actual data

Review Comment:
   ```suggestion
   accesses in runtime. Note that data type given to `Load` / `Store` can be different from the Buffer's data
   ```



##########
rfcs/0063-clarifying-buffer-declaration-and-access.md:
##########
@@ -0,0 +1,232 @@
+- Feature Name: Clarifying Buffer Declaration and Access
+- Author: Wuwei Lin (@vinx13), Eric Lunderberg (@Lunderberg)
+- Start Date: 2022-03-18
+- RFC PR: [apache/tvm-rfcs#63](https://github.com/apache/tvm-rfcs/pull/63)
+- GitHub Issue: [apache/tvm#10505](https://github.com/apache/tvm/issues/10505)
+
+# Summary 
+[summary]: #summary
+
+In https://github.com/apache/tvm/pull/9727 and
+[RFC#39](https://github.com/apache/tvm-rfcs/blob/main/rfcs/0039-buffer-physical-layout.md), we
+deprecated `Load` and `Store` to use `BufferLoad` and `BufferStore` instead in order to support
+generalized multi-dimensional physical buffer access. Here we document necessary clarifications,
+implications about the new buffer convention, as well as the post-hoc pass checklist.
+
+# Motivation
+[motivation]: #motivation
+
+The main goal of this RFC is to summarize the existing buffer convention and the IR changes in
+https://github.com/apache/tvm/pull/9727 which have a broader impact. There are no new semantics
+proposed in this RFC.
+
+# Reference - level explanation
+[reference-level-explanation]: #reference-level-explanation
+
+**What’s a buffer?**
+
+Buffer is a compile-time object that groups runtime objects(data pointer and shape variables)
+structurally. A Buffer needs to be declared and allocated before it can be used.
+
+**Declaration of buffer**
+
+Buffer can be declared in the following ways:
+
+- Buffer map of `PrimFunc`. This specifies how opaque parameters of type `T.handle` should be
+interpreted as a buffer inside the `PrimFunc`. `T.handle` represents an opaque pointer (`void *`).
+- `T.alloc_buffer` is used `S-TIR` to create and allocate a buffer.
+- `T.buffer_decl` can be used to create a buffer alias by specifying the underlying data variable to
+reuse the data from another buffer. It can also be used to reinterpret the data type of the buffer.
+`T.buffer_decl` can also be used to create a buffer alias with a different `elem_offset`.
+`elem_offset` should be handled during the lowering process.
+
+Examples of `T.buffer_decl` is shown below.
+```
+@T.prim_func
+def buffer_alias(A: T.Buffer[(16,), "float"]):
+    A_vector = T.buffer_decl([4], "float32x4", data=A.data)
+
+@T.prim_func
+def buffer_alloc():
+    A = T.buffer_decl([4, 4], "float32")
+    Allocate(A.data, [16], "float32")
+```
+
+**Allocation of buffer**
+
+In low-level TIR, `Allocate` is used to allocate a data variable with given shapes. `Allocate`
+doesn’t operate on the buffer-level. The result of `Allocate` is a data variable, which may be

Review Comment:
   ```suggestion
   returns a data variable of type `T.handle` (since TIR's type system does not accommodate rich arrays), which may be
   ```



##########
rfcs/0063-clarifying-buffer-declaration-and-access.md:
##########
@@ -0,0 +1,232 @@
+- Feature Name: Clarifying Buffer Declaration and Access
+- Author: Wuwei Lin (@vinx13), Eric Lunderberg (@Lunderberg)
+- Start Date: 2022-03-18
+- RFC PR: [apache/tvm-rfcs#63](https://github.com/apache/tvm-rfcs/pull/63)
+- GitHub Issue: [apache/tvm#10505](https://github.com/apache/tvm/issues/10505)
+
+# Summary 
+[summary]: #summary
+
+In https://github.com/apache/tvm/pull/9727 and
+[RFC#39](https://github.com/apache/tvm-rfcs/blob/main/rfcs/0039-buffer-physical-layout.md), we
+deprecated `Load` and `Store` to use `BufferLoad` and `BufferStore` instead in order to support
+generalized multi-dimensional physical buffer access. Here we document necessary clarifications,
+implications about the new buffer convention, as well as the post-hoc pass checklist.
+
+# Motivation
+[motivation]: #motivation
+
+The main goal of this RFC is to summarize the existing buffer convention and the IR changes in
+https://github.com/apache/tvm/pull/9727 which have a broader impact. There are no new semantics
+proposed in this RFC.
+
+# Reference - level explanation
+[reference-level-explanation]: #reference-level-explanation
+
+**What’s a buffer?**
+
+Buffer is a compile-time object that groups runtime objects(data pointer and shape variables)
+structurally. A Buffer needs to be declared and allocated before it can be used.
+
+**Declaration of buffer**
+
+Buffer can be declared in the following ways:
+
+- Buffer map of `PrimFunc`. This specifies how opaque parameters of type `T.handle` should be
+interpreted as a buffer inside the `PrimFunc`. `T.handle` represents an opaque pointer (`void *`).
+- `T.alloc_buffer` is used `S-TIR` to create and allocate a buffer.
+- `T.buffer_decl` can be used to create a buffer alias by specifying the underlying data variable to

Review Comment:
   do you mean `T.decl_buffer`?



##########
rfcs/0063-clarifying-buffer-declaration-and-access.md:
##########
@@ -0,0 +1,232 @@
+- Feature Name: Clarifying Buffer Declaration and Access
+- Author: Wuwei Lin (@vinx13), Eric Lunderberg (@Lunderberg)
+- Start Date: 2022-03-18
+- RFC PR: [apache/tvm-rfcs#63](https://github.com/apache/tvm-rfcs/pull/63)
+- GitHub Issue: [apache/tvm#10505](https://github.com/apache/tvm/issues/10505)
+
+# Summary 
+[summary]: #summary
+
+In https://github.com/apache/tvm/pull/9727 and
+[RFC#39](https://github.com/apache/tvm-rfcs/blob/main/rfcs/0039-buffer-physical-layout.md), we
+deprecated `Load` and `Store` to use `BufferLoad` and `BufferStore` instead in order to support
+generalized multi-dimensional physical buffer access. Here we document necessary clarifications,
+implications about the new buffer convention, as well as the post-hoc pass checklist.
+
+# Motivation
+[motivation]: #motivation
+
+The main goal of this RFC is to summarize the existing buffer convention and the IR changes in
+https://github.com/apache/tvm/pull/9727 which have a broader impact. There are no new semantics
+proposed in this RFC.
+
+# Reference - level explanation
+[reference-level-explanation]: #reference-level-explanation
+
+**What’s a buffer?**
+
+Buffer is a compile-time object that groups runtime objects(data pointer and shape variables)
+structurally. A Buffer needs to be declared and allocated before it can be used.
+
+**Declaration of buffer**
+
+Buffer can be declared in the following ways:
+
+- Buffer map of `PrimFunc`. This specifies how opaque parameters of type `T.handle` should be
+interpreted as a buffer inside the `PrimFunc`. `T.handle` represents an opaque pointer (`void *`).
+- `T.alloc_buffer` is used `S-TIR` to create and allocate a buffer.
+- `T.buffer_decl` can be used to create a buffer alias by specifying the underlying data variable to
+reuse the data from another buffer. It can also be used to reinterpret the data type of the buffer.
+`T.buffer_decl` can also be used to create a buffer alias with a different `elem_offset`.
+`elem_offset` should be handled during the lowering process.
+
+Examples of `T.buffer_decl` is shown below.
+```
+@T.prim_func
+def buffer_alias(A: T.Buffer[(16,), "float"]):
+    A_vector = T.buffer_decl([4], "float32x4", data=A.data)
+
+@T.prim_func
+def buffer_alloc():
+    A = T.buffer_decl([4, 4], "float32")
+    Allocate(A.data, [16], "float32")
+```
+
+**Allocation of buffer**
+
+In low-level TIR, `Allocate` is used to allocate a data variable with given shapes. `Allocate`
+doesn’t operate on the buffer-level. The result of `Allocate` is a data variable, which may be
+reinterpreted with a different shape or data type in buffer declaration.
+
+**Explicit `DeclBuffer` IR construct**
+
+`T.buffer_decl` is not an explicit statement in TIR - There is no such node in TIR.
+The current behavior of `TVMScriptPrinter` is to implicitly print a `T.buffer_decl` at the beginning
+of `PrimFunc` for any undefined buffers. The implicit behavior can be error-prone. In light of the
+migration, we should consider an explicit `DeclBuffer` as part of the IR. This will be furthur
+discussed in a separate RFC.
+
+**Buffer Aliasing**
+
+`T.buffer_decl` creates a buffer alias if the underlying data variable (`.data` field) overlaps with
+another buffer. Buffer created via `alloc_buffer` always do not alias. Buffer aliases do not need

Review Comment:
   ```suggestion
   another buffer. Buffer created via `T.alloc_buffer` always do not alias. Buffer aliases do not need
   ```



##########
rfcs/0063-clarifying-buffer-declaration-and-access.md:
##########
@@ -0,0 +1,232 @@
+- Feature Name: Clarifying Buffer Declaration and Access
+- Author: Wuwei Lin (@vinx13), Eric Lunderberg (@Lunderberg)
+- Start Date: 2022-03-18
+- RFC PR: [apache/tvm-rfcs#63](https://github.com/apache/tvm-rfcs/pull/63)
+- GitHub Issue: [apache/tvm#10505](https://github.com/apache/tvm/issues/10505)
+
+# Summary 
+[summary]: #summary
+
+In https://github.com/apache/tvm/pull/9727 and
+[RFC#39](https://github.com/apache/tvm-rfcs/blob/main/rfcs/0039-buffer-physical-layout.md), we
+deprecated `Load` and `Store` to use `BufferLoad` and `BufferStore` instead in order to support
+generalized multi-dimensional physical buffer access. Here we document necessary clarifications,
+implications about the new buffer convention, as well as the post-hoc pass checklist.
+
+# Motivation
+[motivation]: #motivation
+
+The main goal of this RFC is to summarize the existing buffer convention and the IR changes in
+https://github.com/apache/tvm/pull/9727 which have a broader impact. There are no new semantics
+proposed in this RFC.
+
+# Reference - level explanation
+[reference-level-explanation]: #reference-level-explanation
+
+**What’s a buffer?**
+
+Buffer is a compile-time object that groups runtime objects(data pointer and shape variables)
+structurally. A Buffer needs to be declared and allocated before it can be used.
+
+**Declaration of buffer**
+
+Buffer can be declared in the following ways:
+
+- Buffer map of `PrimFunc`. This specifies how opaque parameters of type `T.handle` should be
+interpreted as a buffer inside the `PrimFunc`. `T.handle` represents an opaque pointer (`void *`).
+- `T.alloc_buffer` is used `S-TIR` to create and allocate a buffer.
+- `T.buffer_decl` can be used to create a buffer alias by specifying the underlying data variable to
+reuse the data from another buffer. It can also be used to reinterpret the data type of the buffer.
+`T.buffer_decl` can also be used to create a buffer alias with a different `elem_offset`.
+`elem_offset` should be handled during the lowering process.
+
+Examples of `T.buffer_decl` is shown below.
+```
+@T.prim_func
+def buffer_alias(A: T.Buffer[(16,), "float"]):
+    A_vector = T.buffer_decl([4], "float32x4", data=A.data)
+
+@T.prim_func
+def buffer_alloc():
+    A = T.buffer_decl([4, 4], "float32")
+    Allocate(A.data, [16], "float32")
+```
+
+**Allocation of buffer**
+
+In low-level TIR, `Allocate` is used to allocate a data variable with given shapes. `Allocate`
+doesn’t operate on the buffer-level. The result of `Allocate` is a data variable, which may be
+reinterpreted with a different shape or data type in buffer declaration.

Review Comment:
   ```suggestion
   reinterpreted with a different shape or data type using `T.decl_buffer`.
   ```



##########
rfcs/0063-clarifying-buffer-declaration-and-access.md:
##########
@@ -0,0 +1,232 @@
+- Feature Name: Clarifying Buffer Declaration and Access
+- Author: Wuwei Lin (@vinx13), Eric Lunderberg (@Lunderberg)
+- Start Date: 2022-03-18
+- RFC PR: [apache/tvm-rfcs#63](https://github.com/apache/tvm-rfcs/pull/63)
+- GitHub Issue: [apache/tvm#10505](https://github.com/apache/tvm/issues/10505)
+
+# Summary 
+[summary]: #summary
+
+In https://github.com/apache/tvm/pull/9727 and
+[RFC#39](https://github.com/apache/tvm-rfcs/blob/main/rfcs/0039-buffer-physical-layout.md), we
+deprecated `Load` and `Store` to use `BufferLoad` and `BufferStore` instead in order to support
+generalized multi-dimensional physical buffer access. Here we document necessary clarifications,
+implications about the new buffer convention, as well as the post-hoc pass checklist.
+
+# Motivation
+[motivation]: #motivation
+
+The main goal of this RFC is to summarize the existing buffer convention and the IR changes in
+https://github.com/apache/tvm/pull/9727 which have a broader impact. There are no new semantics
+proposed in this RFC.
+
+# Reference - level explanation
+[reference-level-explanation]: #reference-level-explanation
+
+**What’s a buffer?**
+
+Buffer is a compile-time object that groups runtime objects(data pointer and shape variables)
+structurally. A Buffer needs to be declared and allocated before it can be used.
+
+**Declaration of buffer**
+
+Buffer can be declared in the following ways:
+
+- Buffer map of `PrimFunc`. This specifies how opaque parameters of type `T.handle` should be
+interpreted as a buffer inside the `PrimFunc`. `T.handle` represents an opaque pointer (`void *`).
+- `T.alloc_buffer` is used `S-TIR` to create and allocate a buffer.
+- `T.buffer_decl` can be used to create a buffer alias by specifying the underlying data variable to
+reuse the data from another buffer. It can also be used to reinterpret the data type of the buffer.
+`T.buffer_decl` can also be used to create a buffer alias with a different `elem_offset`.
+`elem_offset` should be handled during the lowering process.
+
+Examples of `T.buffer_decl` is shown below.
+```
+@T.prim_func
+def buffer_alias(A: T.Buffer[(16,), "float"]):
+    A_vector = T.buffer_decl([4], "float32x4", data=A.data)
+
+@T.prim_func
+def buffer_alloc():
+    A = T.buffer_decl([4, 4], "float32")
+    Allocate(A.data, [16], "float32")
+```
+
+**Allocation of buffer**
+
+In low-level TIR, `Allocate` is used to allocate a data variable with given shapes. `Allocate`
+doesn’t operate on the buffer-level. The result of `Allocate` is a data variable, which may be
+reinterpreted with a different shape or data type in buffer declaration.
+
+**Explicit `DeclBuffer` IR construct**
+
+`T.buffer_decl` is not an explicit statement in TIR - There is no such node in TIR.

Review Comment:
   suggest: "Instead, `T.decl_buffer` returns either:
   1. a Buffer node whose data member points to the aliased Buffer.
   2. A Buffer node whose data member is a new pointer-type Var (the var is expected to be initialized elsewhere)"
   
   not sure on the last line there--could you clarify it?



##########
rfcs/0063-clarifying-buffer-declaration-and-access.md:
##########
@@ -0,0 +1,232 @@
+- Feature Name: Clarifying Buffer Declaration and Access
+- Author: Wuwei Lin (@vinx13), Eric Lunderberg (@Lunderberg)
+- Start Date: 2022-03-18
+- RFC PR: [apache/tvm-rfcs#63](https://github.com/apache/tvm-rfcs/pull/63)
+- GitHub Issue: [apache/tvm#10505](https://github.com/apache/tvm/issues/10505)
+
+# Summary 
+[summary]: #summary
+
+In https://github.com/apache/tvm/pull/9727 and
+[RFC#39](https://github.com/apache/tvm-rfcs/blob/main/rfcs/0039-buffer-physical-layout.md), we
+deprecated `Load` and `Store` to use `BufferLoad` and `BufferStore` instead in order to support
+generalized multi-dimensional physical buffer access. Here we document necessary clarifications,
+implications about the new buffer convention, as well as the post-hoc pass checklist.
+
+# Motivation
+[motivation]: #motivation
+
+The main goal of this RFC is to summarize the existing buffer convention and the IR changes in
+https://github.com/apache/tvm/pull/9727 which have a broader impact. There are no new semantics
+proposed in this RFC.
+
+# Reference - level explanation
+[reference-level-explanation]: #reference-level-explanation
+
+**What’s a buffer?**
+
+Buffer is a compile-time object that groups runtime objects(data pointer and shape variables)
+structurally. A Buffer needs to be declared and allocated before it can be used.
+
+**Declaration of buffer**
+
+Buffer can be declared in the following ways:
+
+- Buffer map of `PrimFunc`. This specifies how opaque parameters of type `T.handle` should be
+interpreted as a buffer inside the `PrimFunc`. `T.handle` represents an opaque pointer (`void *`).
+- `T.alloc_buffer` is used `S-TIR` to create and allocate a buffer.
+- `T.buffer_decl` can be used to create a buffer alias by specifying the underlying data variable to
+reuse the data from another buffer. It can also be used to reinterpret the data type of the buffer.
+`T.buffer_decl` can also be used to create a buffer alias with a different `elem_offset`.
+`elem_offset` should be handled during the lowering process.
+
+Examples of `T.buffer_decl` is shown below.
+```
+@T.prim_func
+def buffer_alias(A: T.Buffer[(16,), "float"]):
+    A_vector = T.buffer_decl([4], "float32x4", data=A.data)
+
+@T.prim_func
+def buffer_alloc():
+    A = T.buffer_decl([4, 4], "float32")
+    Allocate(A.data, [16], "float32")
+```
+
+**Allocation of buffer**
+
+In low-level TIR, `Allocate` is used to allocate a data variable with given shapes. `Allocate`
+doesn’t operate on the buffer-level. The result of `Allocate` is a data variable, which may be
+reinterpreted with a different shape or data type in buffer declaration.
+
+**Explicit `DeclBuffer` IR construct**
+
+`T.buffer_decl` is not an explicit statement in TIR - There is no such node in TIR.
+The current behavior of `TVMScriptPrinter` is to implicitly print a `T.buffer_decl` at the beginning
+of `PrimFunc` for any undefined buffers. The implicit behavior can be error-prone. In light of the
+migration, we should consider an explicit `DeclBuffer` as part of the IR. This will be furthur

Review Comment:
   nit: further



##########
rfcs/0063-clarifying-buffer-declaration-and-access.md:
##########
@@ -0,0 +1,232 @@
+- Feature Name: Clarifying Buffer Declaration and Access
+- Author: Wuwei Lin (@vinx13), Eric Lunderberg (@Lunderberg)
+- Start Date: 2022-03-18
+- RFC PR: [apache/tvm-rfcs#63](https://github.com/apache/tvm-rfcs/pull/63)
+- GitHub Issue: [apache/tvm#10505](https://github.com/apache/tvm/issues/10505)
+
+# Summary 
+[summary]: #summary
+
+In https://github.com/apache/tvm/pull/9727 and
+[RFC#39](https://github.com/apache/tvm-rfcs/blob/main/rfcs/0039-buffer-physical-layout.md), we
+deprecated `Load` and `Store` to use `BufferLoad` and `BufferStore` instead in order to support
+generalized multi-dimensional physical buffer access. Here we document necessary clarifications,
+implications about the new buffer convention, as well as the post-hoc pass checklist.
+
+# Motivation
+[motivation]: #motivation
+
+The main goal of this RFC is to summarize the existing buffer convention and the IR changes in
+https://github.com/apache/tvm/pull/9727 which have a broader impact. There are no new semantics
+proposed in this RFC.
+
+# Reference - level explanation
+[reference-level-explanation]: #reference-level-explanation
+
+**What’s a buffer?**
+
+Buffer is a compile-time object that groups runtime objects(data pointer and shape variables)
+structurally. A Buffer needs to be declared and allocated before it can be used.
+
+**Declaration of buffer**
+
+Buffer can be declared in the following ways:
+
+- Buffer map of `PrimFunc`. This specifies how opaque parameters of type `T.handle` should be
+interpreted as a buffer inside the `PrimFunc`. `T.handle` represents an opaque pointer (`void *`).
+- `T.alloc_buffer` is used `S-TIR` to create and allocate a buffer.
+- `T.buffer_decl` can be used to create a buffer alias by specifying the underlying data variable to
+reuse the data from another buffer. It can also be used to reinterpret the data type of the buffer.
+`T.buffer_decl` can also be used to create a buffer alias with a different `elem_offset`.
+`elem_offset` should be handled during the lowering process.
+
+Examples of `T.buffer_decl` is shown below.
+```
+@T.prim_func
+def buffer_alias(A: T.Buffer[(16,), "float"]):
+    A_vector = T.buffer_decl([4], "float32x4", data=A.data)
+
+@T.prim_func
+def buffer_alloc():
+    A = T.buffer_decl([4, 4], "float32")
+    Allocate(A.data, [16], "float32")
+```
+
+**Allocation of buffer**
+
+In low-level TIR, `Allocate` is used to allocate a data variable with given shapes. `Allocate`
+doesn’t operate on the buffer-level. The result of `Allocate` is a data variable, which may be
+reinterpreted with a different shape or data type in buffer declaration.
+
+**Explicit `DeclBuffer` IR construct**
+
+`T.buffer_decl` is not an explicit statement in TIR - There is no such node in TIR.
+The current behavior of `TVMScriptPrinter` is to implicitly print a `T.buffer_decl` at the beginning
+of `PrimFunc` for any undefined buffers. The implicit behavior can be error-prone. In light of the
+migration, we should consider an explicit `DeclBuffer` as part of the IR. This will be furthur
+discussed in a separate RFC.
+
+**Buffer Aliasing**
+
+`T.buffer_decl` creates a buffer alias if the underlying data variable (`.data` field) overlaps with
+another buffer. Buffer created via `alloc_buffer` always do not alias. Buffer aliases do not need
+`Allocate` to create the data variable. Each data variable can be allocated via `Allocate` once.
+If a transformation would produce multiple allocations of the same buffer var (e.g. unrolling a loop
+that contains an allocation), the transform should update the allocations to be unique using
+`tvm::tir::ConvertSSA`
+
+Buffers should not alias each other unless necessary, because aliased buffers increase complexity
+for TIR transformations.  Passes that rewrite buffers should clearly indicate how aliased buffers
+are handled.  For example, when changing the underlying layout of stored elements in a buffer, all
+buffer aliases must also be updated.  While buffer aliasing is typically free at runtime, this
+imposes a cost for buffer aliasing both to compile times and development complexity. 
+
+**Discussion: When it is safe to transform a buffer**
+
+We would like to discuss some examples of when it is safe to transform a buffer w.r.t. aliasing rules:
+
+(1) reshape, (2) layout transform (e.g. swap indices), (3) compact. 

Review Comment:
   suggest to convert to a markdown list



##########
rfcs/0063-clarifying-buffer-declaration-and-access.md:
##########
@@ -0,0 +1,232 @@
+- Feature Name: Clarifying Buffer Declaration and Access
+- Author: Wuwei Lin (@vinx13), Eric Lunderberg (@Lunderberg)
+- Start Date: 2022-03-18
+- RFC PR: [apache/tvm-rfcs#63](https://github.com/apache/tvm-rfcs/pull/63)
+- GitHub Issue: [apache/tvm#10505](https://github.com/apache/tvm/issues/10505)
+
+# Summary 
+[summary]: #summary
+
+In https://github.com/apache/tvm/pull/9727 and
+[RFC#39](https://github.com/apache/tvm-rfcs/blob/main/rfcs/0039-buffer-physical-layout.md), we
+deprecated `Load` and `Store` to use `BufferLoad` and `BufferStore` instead in order to support
+generalized multi-dimensional physical buffer access. Here we document necessary clarifications,
+implications about the new buffer convention, as well as the post-hoc pass checklist.
+
+# Motivation
+[motivation]: #motivation
+
+The main goal of this RFC is to summarize the existing buffer convention and the IR changes in
+https://github.com/apache/tvm/pull/9727 which have a broader impact. There are no new semantics
+proposed in this RFC.
+
+# Reference - level explanation
+[reference-level-explanation]: #reference-level-explanation
+
+**What’s a buffer?**
+
+Buffer is a compile-time object that groups runtime objects(data pointer and shape variables)
+structurally. A Buffer needs to be declared and allocated before it can be used.
+
+**Declaration of buffer**
+
+Buffer can be declared in the following ways:
+
+- Buffer map of `PrimFunc`. This specifies how opaque parameters of type `T.handle` should be
+interpreted as a buffer inside the `PrimFunc`. `T.handle` represents an opaque pointer (`void *`).
+- `T.alloc_buffer` is used `S-TIR` to create and allocate a buffer.
+- `T.buffer_decl` can be used to create a buffer alias by specifying the underlying data variable to
+reuse the data from another buffer. It can also be used to reinterpret the data type of the buffer.
+`T.buffer_decl` can also be used to create a buffer alias with a different `elem_offset`.
+`elem_offset` should be handled during the lowering process.
+
+Examples of `T.buffer_decl` is shown below.
+```
+@T.prim_func
+def buffer_alias(A: T.Buffer[(16,), "float"]):
+    A_vector = T.buffer_decl([4], "float32x4", data=A.data)
+
+@T.prim_func
+def buffer_alloc():
+    A = T.buffer_decl([4, 4], "float32")
+    Allocate(A.data, [16], "float32")
+```
+
+**Allocation of buffer**
+
+In low-level TIR, `Allocate` is used to allocate a data variable with given shapes. `Allocate`

Review Comment:
   ```suggestion
   In low-level TIR, `tir::Allocate` is used to allocate a data variable with given shapes. `tir::Allocate`
   ```



##########
rfcs/0063-clarifying-buffer-declaration-and-access.md:
##########
@@ -0,0 +1,232 @@
+- Feature Name: Clarifying Buffer Declaration and Access
+- Author: Wuwei Lin (@vinx13), Eric Lunderberg (@Lunderberg)
+- Start Date: 2022-03-18
+- RFC PR: [apache/tvm-rfcs#63](https://github.com/apache/tvm-rfcs/pull/63)
+- GitHub Issue: [apache/tvm#10505](https://github.com/apache/tvm/issues/10505)
+
+# Summary 
+[summary]: #summary
+
+In https://github.com/apache/tvm/pull/9727 and
+[RFC#39](https://github.com/apache/tvm-rfcs/blob/main/rfcs/0039-buffer-physical-layout.md), we
+deprecated `Load` and `Store` to use `BufferLoad` and `BufferStore` instead in order to support
+generalized multi-dimensional physical buffer access. Here we document necessary clarifications,
+implications about the new buffer convention, as well as the post-hoc pass checklist.
+
+# Motivation
+[motivation]: #motivation
+
+The main goal of this RFC is to summarize the existing buffer convention and the IR changes in
+https://github.com/apache/tvm/pull/9727 which have a broader impact. There are no new semantics
+proposed in this RFC.
+
+# Reference - level explanation
+[reference-level-explanation]: #reference-level-explanation
+
+**What’s a buffer?**
+
+Buffer is a compile-time object that groups runtime objects(data pointer and shape variables)
+structurally. A Buffer needs to be declared and allocated before it can be used.
+
+**Declaration of buffer**
+
+Buffer can be declared in the following ways:
+
+- Buffer map of `PrimFunc`. This specifies how opaque parameters of type `T.handle` should be
+interpreted as a buffer inside the `PrimFunc`. `T.handle` represents an opaque pointer (`void *`).
+- `T.alloc_buffer` is used `S-TIR` to create and allocate a buffer.
+- `T.buffer_decl` can be used to create a buffer alias by specifying the underlying data variable to
+reuse the data from another buffer. It can also be used to reinterpret the data type of the buffer.
+`T.buffer_decl` can also be used to create a buffer alias with a different `elem_offset`.
+`elem_offset` should be handled during the lowering process.
+
+Examples of `T.buffer_decl` is shown below.
+```
+@T.prim_func
+def buffer_alias(A: T.Buffer[(16,), "float"]):
+    A_vector = T.buffer_decl([4], "float32x4", data=A.data)
+
+@T.prim_func
+def buffer_alloc():
+    A = T.buffer_decl([4, 4], "float32")
+    Allocate(A.data, [16], "float32")
+```
+
+**Allocation of buffer**
+
+In low-level TIR, `Allocate` is used to allocate a data variable with given shapes. `Allocate`
+doesn’t operate on the buffer-level. The result of `Allocate` is a data variable, which may be
+reinterpreted with a different shape or data type in buffer declaration.
+
+**Explicit `DeclBuffer` IR construct**
+
+`T.buffer_decl` is not an explicit statement in TIR - There is no such node in TIR.
+The current behavior of `TVMScriptPrinter` is to implicitly print a `T.buffer_decl` at the beginning
+of `PrimFunc` for any undefined buffers. The implicit behavior can be error-prone. In light of the
+migration, we should consider an explicit `DeclBuffer` as part of the IR. This will be furthur
+discussed in a separate RFC.
+
+**Buffer Aliasing**
+
+`T.buffer_decl` creates a buffer alias if the underlying data variable (`.data` field) overlaps with
+another buffer. Buffer created via `alloc_buffer` always do not alias. Buffer aliases do not need
+`Allocate` to create the data variable. Each data variable can be allocated via `Allocate` once.
+If a transformation would produce multiple allocations of the same buffer var (e.g. unrolling a loop
+that contains an allocation), the transform should update the allocations to be unique using
+`tvm::tir::ConvertSSA`
+
+Buffers should not alias each other unless necessary, because aliased buffers increase complexity
+for TIR transformations.  Passes that rewrite buffers should clearly indicate how aliased buffers
+are handled.  For example, when changing the underlying layout of stored elements in a buffer, all
+buffer aliases must also be updated.  While buffer aliasing is typically free at runtime, this
+imposes a cost for buffer aliasing both to compile times and development complexity. 

Review Comment:
   are there any such utility visitor classes to produce a map of aliased Buffer?



##########
rfcs/0063-clarifying-buffer-declaration-and-access.md:
##########
@@ -0,0 +1,232 @@
+- Feature Name: Clarifying Buffer Declaration and Access
+- Author: Wuwei Lin (@vinx13), Eric Lunderberg (@Lunderberg)
+- Start Date: 2022-03-18
+- RFC PR: [apache/tvm-rfcs#63](https://github.com/apache/tvm-rfcs/pull/63)
+- GitHub Issue: [apache/tvm#10505](https://github.com/apache/tvm/issues/10505)
+
+# Summary 
+[summary]: #summary
+
+In https://github.com/apache/tvm/pull/9727 and
+[RFC#39](https://github.com/apache/tvm-rfcs/blob/main/rfcs/0039-buffer-physical-layout.md), we
+deprecated `Load` and `Store` to use `BufferLoad` and `BufferStore` instead in order to support
+generalized multi-dimensional physical buffer access. Here we document necessary clarifications,
+implications about the new buffer convention, as well as the post-hoc pass checklist.
+
+# Motivation
+[motivation]: #motivation
+
+The main goal of this RFC is to summarize the existing buffer convention and the IR changes in
+https://github.com/apache/tvm/pull/9727 which have a broader impact. There are no new semantics
+proposed in this RFC.
+
+# Reference - level explanation
+[reference-level-explanation]: #reference-level-explanation
+
+**What’s a buffer?**
+
+Buffer is a compile-time object that groups runtime objects(data pointer and shape variables)
+structurally. A Buffer needs to be declared and allocated before it can be used.
+
+**Declaration of buffer**
+
+Buffer can be declared in the following ways:
+
+- Buffer map of `PrimFunc`. This specifies how opaque parameters of type `T.handle` should be
+interpreted as a buffer inside the `PrimFunc`. `T.handle` represents an opaque pointer (`void *`).
+- `T.alloc_buffer` is used `S-TIR` to create and allocate a buffer.
+- `T.buffer_decl` can be used to create a buffer alias by specifying the underlying data variable to
+reuse the data from another buffer. It can also be used to reinterpret the data type of the buffer.
+`T.buffer_decl` can also be used to create a buffer alias with a different `elem_offset`.
+`elem_offset` should be handled during the lowering process.
+
+Examples of `T.buffer_decl` is shown below.
+```
+@T.prim_func
+def buffer_alias(A: T.Buffer[(16,), "float"]):
+    A_vector = T.buffer_decl([4], "float32x4", data=A.data)
+
+@T.prim_func
+def buffer_alloc():
+    A = T.buffer_decl([4, 4], "float32")
+    Allocate(A.data, [16], "float32")
+```
+
+**Allocation of buffer**
+
+In low-level TIR, `Allocate` is used to allocate a data variable with given shapes. `Allocate`
+doesn’t operate on the buffer-level. The result of `Allocate` is a data variable, which may be
+reinterpreted with a different shape or data type in buffer declaration.
+
+**Explicit `DeclBuffer` IR construct**
+
+`T.buffer_decl` is not an explicit statement in TIR - There is no such node in TIR.
+The current behavior of `TVMScriptPrinter` is to implicitly print a `T.buffer_decl` at the beginning
+of `PrimFunc` for any undefined buffers. The implicit behavior can be error-prone. In light of the
+migration, we should consider an explicit `DeclBuffer` as part of the IR. This will be furthur
+discussed in a separate RFC.
+
+**Buffer Aliasing**
+
+`T.buffer_decl` creates a buffer alias if the underlying data variable (`.data` field) overlaps with
+another buffer. Buffer created via `alloc_buffer` always do not alias. Buffer aliases do not need
+`Allocate` to create the data variable. Each data variable can be allocated via `Allocate` once.

Review Comment:
   could you state the implication of "Buffer aliases do not need `Allocate` to create the data variable"? e.g. I think you mean something like "the `data` variable could point to any aliased or unaliased buffer."



##########
rfcs/0063-clarifying-buffer-declaration-and-access.md:
##########
@@ -0,0 +1,232 @@
+- Feature Name: Clarifying Buffer Declaration and Access
+- Author: Wuwei Lin (@vinx13), Eric Lunderberg (@Lunderberg)
+- Start Date: 2022-03-18
+- RFC PR: [apache/tvm-rfcs#63](https://github.com/apache/tvm-rfcs/pull/63)
+- GitHub Issue: [apache/tvm#10505](https://github.com/apache/tvm/issues/10505)
+
+# Summary 
+[summary]: #summary
+
+In https://github.com/apache/tvm/pull/9727 and
+[RFC#39](https://github.com/apache/tvm-rfcs/blob/main/rfcs/0039-buffer-physical-layout.md), we
+deprecated `Load` and `Store` to use `BufferLoad` and `BufferStore` instead in order to support
+generalized multi-dimensional physical buffer access. Here we document necessary clarifications,
+implications about the new buffer convention, as well as the post-hoc pass checklist.
+
+# Motivation
+[motivation]: #motivation
+
+The main goal of this RFC is to summarize the existing buffer convention and the IR changes in
+https://github.com/apache/tvm/pull/9727 which have a broader impact. There are no new semantics
+proposed in this RFC.
+
+# Reference - level explanation
+[reference-level-explanation]: #reference-level-explanation
+
+**What’s a buffer?**
+
+Buffer is a compile-time object that groups runtime objects(data pointer and shape variables)

Review Comment:
   suggest we describe the purpose of this grouping to give Buffer more of a scope. maybe (but this is testing my understanding of the Buffer changes):
   Buffer is a compile-time representation of contiguous block of memory. Since a Buffer is typically used as backing storage for a TensorType, it includes relevant information from that TensorType which can be sufficiently generalized to an array, such as data type and shape information.



##########
rfcs/0063-clarifying-buffer-declaration-and-access.md:
##########
@@ -0,0 +1,232 @@
+- Feature Name: Clarifying Buffer Declaration and Access
+- Author: Wuwei Lin (@vinx13), Eric Lunderberg (@Lunderberg)
+- Start Date: 2022-03-18
+- RFC PR: [apache/tvm-rfcs#63](https://github.com/apache/tvm-rfcs/pull/63)
+- GitHub Issue: [apache/tvm#10505](https://github.com/apache/tvm/issues/10505)
+
+# Summary 
+[summary]: #summary
+
+In https://github.com/apache/tvm/pull/9727 and
+[RFC#39](https://github.com/apache/tvm-rfcs/blob/main/rfcs/0039-buffer-physical-layout.md), we
+deprecated `Load` and `Store` to use `BufferLoad` and `BufferStore` instead in order to support
+generalized multi-dimensional physical buffer access. Here we document necessary clarifications,
+implications about the new buffer convention, as well as the post-hoc pass checklist.
+
+# Motivation
+[motivation]: #motivation
+
+The main goal of this RFC is to summarize the existing buffer convention and the IR changes in
+https://github.com/apache/tvm/pull/9727 which have a broader impact. There are no new semantics
+proposed in this RFC.
+
+# Reference - level explanation
+[reference-level-explanation]: #reference-level-explanation
+
+**What’s a buffer?**
+
+Buffer is a compile-time object that groups runtime objects(data pointer and shape variables)
+structurally. A Buffer needs to be declared and allocated before it can be used.
+
+**Declaration of buffer**
+
+Buffer can be declared in the following ways:
+
+- Buffer map of `PrimFunc`. This specifies how opaque parameters of type `T.handle` should be
+interpreted as a buffer inside the `PrimFunc`. `T.handle` represents an opaque pointer (`void *`).
+- `T.alloc_buffer` is used `S-TIR` to create and allocate a buffer.
+- `T.buffer_decl` can be used to create a buffer alias by specifying the underlying data variable to
+reuse the data from another buffer. It can also be used to reinterpret the data type of the buffer.
+`T.buffer_decl` can also be used to create a buffer alias with a different `elem_offset`.
+`elem_offset` should be handled during the lowering process.
+
+Examples of `T.buffer_decl` is shown below.
+```
+@T.prim_func
+def buffer_alias(A: T.Buffer[(16,), "float"]):
+    A_vector = T.buffer_decl([4], "float32x4", data=A.data)
+
+@T.prim_func
+def buffer_alloc():
+    A = T.buffer_decl([4, 4], "float32")
+    Allocate(A.data, [16], "float32")
+```
+
+**Allocation of buffer**
+
+In low-level TIR, `Allocate` is used to allocate a data variable with given shapes. `Allocate`
+doesn’t operate on the buffer-level. The result of `Allocate` is a data variable, which may be
+reinterpreted with a different shape or data type in buffer declaration.
+
+**Explicit `DeclBuffer` IR construct**
+
+`T.buffer_decl` is not an explicit statement in TIR - There is no such node in TIR.
+The current behavior of `TVMScriptPrinter` is to implicitly print a `T.buffer_decl` at the beginning
+of `PrimFunc` for any undefined buffers. The implicit behavior can be error-prone. In light of the
+migration, we should consider an explicit `DeclBuffer` as part of the IR. This will be furthur
+discussed in a separate RFC.
+
+**Buffer Aliasing**
+
+`T.buffer_decl` creates a buffer alias if the underlying data variable (`.data` field) overlaps with
+another buffer. Buffer created via `alloc_buffer` always do not alias. Buffer aliases do not need
+`Allocate` to create the data variable. Each data variable can be allocated via `Allocate` once.
+If a transformation would produce multiple allocations of the same buffer var (e.g. unrolling a loop
+that contains an allocation), the transform should update the allocations to be unique using
+`tvm::tir::ConvertSSA`
+
+Buffers should not alias each other unless necessary, because aliased buffers increase complexity
+for TIR transformations.  Passes that rewrite buffers should clearly indicate how aliased buffers
+are handled.  For example, when changing the underlying layout of stored elements in a buffer, all
+buffer aliases must also be updated.  While buffer aliasing is typically free at runtime, this
+imposes a cost for buffer aliasing both to compile times and development complexity. 
+
+**Discussion: When it is safe to transform a buffer**
+
+We would like to discuss some examples of when it is safe to transform a buffer w.r.t. aliasing rules:
+
+(1) reshape, (2) layout transform (e.g. swap indices), (3) compact. 
+
+(1) is fine under aliasing as long as the low level memory is shared. (2) and (3) would need more
+cares. (2) requires all the aliases be changed together. (3) requires to compute the compact buffer
+shape and then rewrite the buffer shape. This need us to take all alias into consideration and then
+rewrite their shapes together.
+
+**Generalizing buffer accesses**
+
+Previously we used `Load` and `Store` to represent low-level buffer accesses. `Load` and `Store`
+consist of data variable, data type and index, which can be directly translated to pointer cast and
+accesses in runtime. Note that data type in `Load` / `Store` can be different from the actual data
+variable type. For example,
+
+```python
+A = T.buffer_decl(shape=(16,), dtype='float')
+T.load("float4", A.data, T.ramp(4, 1, 4))
+```
+
+can be translated to
+
+```cpp
+*((float4*)(A + 4))
+```
+
+in C codegen. 
+
+`BufferLoad` and `BufferStore` themselves can not reinterpret a buffer to a different shape or

Review Comment:
   ```suggestion
   However, `BufferLoad` and `BufferStore` themselves can not reinterpret a buffer to a different shape or
   ```



##########
rfcs/0063-clarifying-buffer-declaration-and-access.md:
##########
@@ -0,0 +1,232 @@
+- Feature Name: Clarifying Buffer Declaration and Access
+- Author: Wuwei Lin (@vinx13), Eric Lunderberg (@Lunderberg)
+- Start Date: 2022-03-18
+- RFC PR: [apache/tvm-rfcs#63](https://github.com/apache/tvm-rfcs/pull/63)
+- GitHub Issue: [apache/tvm#10505](https://github.com/apache/tvm/issues/10505)
+
+# Summary 
+[summary]: #summary
+
+In https://github.com/apache/tvm/pull/9727 and
+[RFC#39](https://github.com/apache/tvm-rfcs/blob/main/rfcs/0039-buffer-physical-layout.md), we
+deprecated `Load` and `Store` to use `BufferLoad` and `BufferStore` instead in order to support
+generalized multi-dimensional physical buffer access. Here we document necessary clarifications,
+implications about the new buffer convention, as well as the post-hoc pass checklist.
+
+# Motivation
+[motivation]: #motivation
+
+The main goal of this RFC is to summarize the existing buffer convention and the IR changes in
+https://github.com/apache/tvm/pull/9727 which have a broader impact. There are no new semantics
+proposed in this RFC.
+
+# Reference - level explanation
+[reference-level-explanation]: #reference-level-explanation
+
+**What’s a buffer?**
+
+Buffer is a compile-time object that groups runtime objects(data pointer and shape variables)
+structurally. A Buffer needs to be declared and allocated before it can be used.
+
+**Declaration of buffer**
+
+Buffer can be declared in the following ways:
+
+- Buffer map of `PrimFunc`. This specifies how opaque parameters of type `T.handle` should be
+interpreted as a buffer inside the `PrimFunc`. `T.handle` represents an opaque pointer (`void *`).
+- `T.alloc_buffer` is used `S-TIR` to create and allocate a buffer.
+- `T.buffer_decl` can be used to create a buffer alias by specifying the underlying data variable to
+reuse the data from another buffer. It can also be used to reinterpret the data type of the buffer.
+`T.buffer_decl` can also be used to create a buffer alias with a different `elem_offset`.
+`elem_offset` should be handled during the lowering process.
+
+Examples of `T.buffer_decl` is shown below.
+```
+@T.prim_func
+def buffer_alias(A: T.Buffer[(16,), "float"]):
+    A_vector = T.buffer_decl([4], "float32x4", data=A.data)
+
+@T.prim_func
+def buffer_alloc():
+    A = T.buffer_decl([4, 4], "float32")
+    Allocate(A.data, [16], "float32")
+```
+
+**Allocation of buffer**
+
+In low-level TIR, `Allocate` is used to allocate a data variable with given shapes. `Allocate`
+doesn’t operate on the buffer-level. The result of `Allocate` is a data variable, which may be
+reinterpreted with a different shape or data type in buffer declaration.
+
+**Explicit `DeclBuffer` IR construct**
+
+`T.buffer_decl` is not an explicit statement in TIR - There is no such node in TIR.
+The current behavior of `TVMScriptPrinter` is to implicitly print a `T.buffer_decl` at the beginning
+of `PrimFunc` for any undefined buffers. The implicit behavior can be error-prone. In light of the
+migration, we should consider an explicit `DeclBuffer` as part of the IR. This will be furthur
+discussed in a separate RFC.
+
+**Buffer Aliasing**
+
+`T.buffer_decl` creates a buffer alias if the underlying data variable (`.data` field) overlaps with
+another buffer. Buffer created via `alloc_buffer` always do not alias. Buffer aliases do not need
+`Allocate` to create the data variable. Each data variable can be allocated via `Allocate` once.
+If a transformation would produce multiple allocations of the same buffer var (e.g. unrolling a loop
+that contains an allocation), the transform should update the allocations to be unique using
+`tvm::tir::ConvertSSA`
+
+Buffers should not alias each other unless necessary, because aliased buffers increase complexity
+for TIR transformations.  Passes that rewrite buffers should clearly indicate how aliased buffers
+are handled.  For example, when changing the underlying layout of stored elements in a buffer, all
+buffer aliases must also be updated.  While buffer aliasing is typically free at runtime, this
+imposes a cost for buffer aliasing both to compile times and development complexity. 
+
+**Discussion: When it is safe to transform a buffer**
+
+We would like to discuss some examples of when it is safe to transform a buffer w.r.t. aliasing rules:
+
+(1) reshape, (2) layout transform (e.g. swap indices), (3) compact. 
+
+(1) is fine under aliasing as long as the low level memory is shared. (2) and (3) would need more
+cares. (2) requires all the aliases be changed together. (3) requires to compute the compact buffer
+shape and then rewrite the buffer shape. This need us to take all alias into consideration and then
+rewrite their shapes together.
+
+**Generalizing buffer accesses**
+
+Previously we used `Load` and `Store` to represent low-level buffer accesses. `Load` and `Store`
+consist of data variable, data type and index, which can be directly translated to pointer cast and
+accesses in runtime. Note that data type in `Load` / `Store` can be different from the actual data
+variable type. For example,
+
+```python
+A = T.buffer_decl(shape=(16,), dtype='float')
+T.load("float4", A.data, T.ramp(4, 1, 4))
+```
+
+can be translated to
+
+```cpp
+*((float4*)(A + 4))
+```
+
+in C codegen. 
+
+`BufferLoad` and `BufferStore` themselves can not reinterpret a buffer to a different shape or
+data type. They rely on the underlying buffer object. This is the fundamental difference between
+`Load/Store` and `BufferLoad/BufferStore` that we need to deal with carefully. 
+
+Vectorized access is achieved by using `Ramp` as index in `Load/Store`. Vectorized buffer access
+via `BufferLoad`/`BufferStore` can be achieved either by using a scalar index to access a buffer
+that has a vectorized type, or by using `Ramp` as an index into a buffer that has a scalar type.
+For N-D buffer indices, it is possible that `Ramp` being used in multiple dimensions
+(e.g. `A[Ramp(...), ..., Ramp(...)]` ). In this case the number of lanes of the data type of such
+value is the product of each `Ramp`. We may want to limit `Ramp` to only the last dimension as
+multiple `Ramp` creates additional complexity.
+
+Different combinations of buffer type and index type (scalar vs. vector) are clarified in
+
+[RFC#39](https://github.com/Lunderberg/tvm-rfcs/blob/data_layout/rfcs/0039-buffer-physical-layout.md#rationale-and-alternatives),
+excerpts are the following:
+
+```python
+@T.prim_func
+def scalar_load_from_scalar_buffer(A: T.Buffer[(64,), "float32"]):
+    assert A[0].dtype == "float32"
+
+@T.prim_func
+def vector_load_from_vector_buffer(A: T.Buffer[(16,), "float32x4"]):
+    assert A[0].dtype == "float32x4"
+
+@T.prim_func
+def vector_load_from_vector_buffer(A: T.Buffer[(16,), "float32x4"]):
+    A_vector_2 = T.buffer_decl([32], "float32x2", data=A.data)
+    assert A[0].dtype == "float32x4"
+    assert A_vector_2[0].dtype == "float32x2"
+
+@T.prim_func
+def vector_load_from_scalar_buffer_option1(A: T.Buffer[(64,), "float32"]):
+    assert A[T.ramp(0, 1, 4)].dtype == "float32x4"
+
+@T.prim_func
+def vector_load_from_scalar_buffer_option2(A: T.Buffer[(64,), "float32"]):
+    A_vector = T.buffer_decl([16], "float32x4", data=A.data)
+    assert A_vector[0].dtype == "float32x4"
+
+@T.prim_func
+def scalar_load_from_vector_buffer(A: T.Buffer[(16,), "float32x4"]):
+    A_scalar = T.buffer_decl([64], "float32", data=A.data)
+    assert A_scalar[0].dtype == "float32"
+
+#multiple dimensional buffer accesses
+@T.prim_func
+def nd_scalar_load_from_scalar_buffer(A: T.Buffer[(64, 64,), "float32"]):
+    assert A[0, 0].dtype == "float32"
+
+@T.prim_func
+def nd_vector_load_from_scalar_buffer(A: T.Buffer[(64,64), "float32"]):
+    assert A[0, T.ramp(0, 1, 4)].dtype == "float32x4"
+```
+
+In rare cases, vector index can be used to access a vector buffer. We may leave this usage as
+undefinedUnless we have a clear use case.
+
+**VectorBufferRewrite**
+
+In some backend like SPIR-V where runtime pointer casts are not available, even between types that
+differ only in the number of lanes (e.g. `float16` and `float16x4.`) `VectorTypeRewriter` will be

Review Comment:
   ```suggestion
   differ only in the number of lanes (e.g. `float16` and `float16x4.`), `VectorTypeRewriter` will be
   ```



##########
rfcs/0063-clarifying-buffer-declaration-and-access.md:
##########
@@ -0,0 +1,232 @@
+- Feature Name: Clarifying Buffer Declaration and Access
+- Author: Wuwei Lin (@vinx13), Eric Lunderberg (@Lunderberg)
+- Start Date: 2022-03-18
+- RFC PR: [apache/tvm-rfcs#63](https://github.com/apache/tvm-rfcs/pull/63)
+- GitHub Issue: [apache/tvm#10505](https://github.com/apache/tvm/issues/10505)
+
+# Summary 
+[summary]: #summary
+
+In https://github.com/apache/tvm/pull/9727 and
+[RFC#39](https://github.com/apache/tvm-rfcs/blob/main/rfcs/0039-buffer-physical-layout.md), we
+deprecated `Load` and `Store` to use `BufferLoad` and `BufferStore` instead in order to support
+generalized multi-dimensional physical buffer access. Here we document necessary clarifications,
+implications about the new buffer convention, as well as the post-hoc pass checklist.
+
+# Motivation
+[motivation]: #motivation
+
+The main goal of this RFC is to summarize the existing buffer convention and the IR changes in
+https://github.com/apache/tvm/pull/9727 which have a broader impact. There are no new semantics
+proposed in this RFC.
+
+# Reference - level explanation
+[reference-level-explanation]: #reference-level-explanation
+
+**What’s a buffer?**
+
+Buffer is a compile-time object that groups runtime objects(data pointer and shape variables)
+structurally. A Buffer needs to be declared and allocated before it can be used.
+
+**Declaration of buffer**
+
+Buffer can be declared in the following ways:
+
+- Buffer map of `PrimFunc`. This specifies how opaque parameters of type `T.handle` should be
+interpreted as a buffer inside the `PrimFunc`. `T.handle` represents an opaque pointer (`void *`).
+- `T.alloc_buffer` is used `S-TIR` to create and allocate a buffer.
+- `T.buffer_decl` can be used to create a buffer alias by specifying the underlying data variable to
+reuse the data from another buffer. It can also be used to reinterpret the data type of the buffer.
+`T.buffer_decl` can also be used to create a buffer alias with a different `elem_offset`.
+`elem_offset` should be handled during the lowering process.
+
+Examples of `T.buffer_decl` is shown below.
+```
+@T.prim_func
+def buffer_alias(A: T.Buffer[(16,), "float"]):
+    A_vector = T.buffer_decl([4], "float32x4", data=A.data)
+
+@T.prim_func
+def buffer_alloc():
+    A = T.buffer_decl([4, 4], "float32")
+    Allocate(A.data, [16], "float32")
+```
+
+**Allocation of buffer**
+
+In low-level TIR, `Allocate` is used to allocate a data variable with given shapes. `Allocate`
+doesn’t operate on the buffer-level. The result of `Allocate` is a data variable, which may be
+reinterpreted with a different shape or data type in buffer declaration.
+
+**Explicit `DeclBuffer` IR construct**
+
+`T.buffer_decl` is not an explicit statement in TIR - There is no such node in TIR.
+The current behavior of `TVMScriptPrinter` is to implicitly print a `T.buffer_decl` at the beginning
+of `PrimFunc` for any undefined buffers. The implicit behavior can be error-prone. In light of the
+migration, we should consider an explicit `DeclBuffer` as part of the IR. This will be furthur
+discussed in a separate RFC.
+
+**Buffer Aliasing**
+
+`T.buffer_decl` creates a buffer alias if the underlying data variable (`.data` field) overlaps with
+another buffer. Buffer created via `alloc_buffer` always do not alias. Buffer aliases do not need
+`Allocate` to create the data variable. Each data variable can be allocated via `Allocate` once.
+If a transformation would produce multiple allocations of the same buffer var (e.g. unrolling a loop
+that contains an allocation), the transform should update the allocations to be unique using
+`tvm::tir::ConvertSSA`
+
+Buffers should not alias each other unless necessary, because aliased buffers increase complexity
+for TIR transformations.  Passes that rewrite buffers should clearly indicate how aliased buffers
+are handled.  For example, when changing the underlying layout of stored elements in a buffer, all
+buffer aliases must also be updated.  While buffer aliasing is typically free at runtime, this
+imposes a cost for buffer aliasing both to compile times and development complexity. 
+
+**Discussion: When it is safe to transform a buffer**
+
+We would like to discuss some examples of when it is safe to transform a buffer w.r.t. aliasing rules:
+
+(1) reshape, (2) layout transform (e.g. swap indices), (3) compact. 
+
+(1) is fine under aliasing as long as the low level memory is shared. (2) and (3) would need more
+cares. (2) requires all the aliases be changed together. (3) requires to compute the compact buffer
+shape and then rewrite the buffer shape. This need us to take all alias into consideration and then
+rewrite their shapes together.
+
+**Generalizing buffer accesses**
+
+Previously we used `Load` and `Store` to represent low-level buffer accesses. `Load` and `Store`
+consist of data variable, data type and index, which can be directly translated to pointer cast and
+accesses in runtime. Note that data type in `Load` / `Store` can be different from the actual data
+variable type. For example,
+
+```python
+A = T.buffer_decl(shape=(16,), dtype='float')
+T.load("float4", A.data, T.ramp(4, 1, 4))
+```
+
+can be translated to
+
+```cpp
+*((float4*)(A + 4))
+```
+
+in C codegen. 
+
+`BufferLoad` and `BufferStore` themselves can not reinterpret a buffer to a different shape or
+data type. They rely on the underlying buffer object. This is the fundamental difference between
+`Load/Store` and `BufferLoad/BufferStore` that we need to deal with carefully. 
+
+Vectorized access is achieved by using `Ramp` as index in `Load/Store`. Vectorized buffer access
+via `BufferLoad`/`BufferStore` can be achieved either by using a scalar index to access a buffer
+that has a vectorized type, or by using `Ramp` as an index into a buffer that has a scalar type.
+For N-D buffer indices, it is possible that `Ramp` being used in multiple dimensions
+(e.g. `A[Ramp(...), ..., Ramp(...)]` ). In this case the number of lanes of the data type of such
+value is the product of each `Ramp`. We may want to limit `Ramp` to only the last dimension as

Review Comment:
   ```suggestion
   value is the product of each `Ramp`. In the future, we may want to limit `Ramp` to only the last dimension as
   ```



##########
rfcs/0063-clarifying-buffer-declaration-and-access.md:
##########
@@ -0,0 +1,232 @@
+- Feature Name: Clarifying Buffer Declaration and Access
+- Author: Wuwei Lin (@vinx13), Eric Lunderberg (@Lunderberg)
+- Start Date: 2022-03-18
+- RFC PR: [apache/tvm-rfcs#63](https://github.com/apache/tvm-rfcs/pull/63)
+- GitHub Issue: [apache/tvm#10505](https://github.com/apache/tvm/issues/10505)
+
+# Summary 
+[summary]: #summary
+
+In https://github.com/apache/tvm/pull/9727 and
+[RFC#39](https://github.com/apache/tvm-rfcs/blob/main/rfcs/0039-buffer-physical-layout.md), we
+deprecated `Load` and `Store` to use `BufferLoad` and `BufferStore` instead in order to support
+generalized multi-dimensional physical buffer access. Here we document necessary clarifications,
+implications about the new buffer convention, as well as the post-hoc pass checklist.
+
+# Motivation
+[motivation]: #motivation
+
+The main goal of this RFC is to summarize the existing buffer convention and the IR changes in
+https://github.com/apache/tvm/pull/9727 which have a broader impact. There are no new semantics
+proposed in this RFC.
+
+# Reference - level explanation
+[reference-level-explanation]: #reference-level-explanation
+
+**What’s a buffer?**
+
+Buffer is a compile-time object that groups runtime objects(data pointer and shape variables)
+structurally. A Buffer needs to be declared and allocated before it can be used.
+
+**Declaration of buffer**
+
+Buffer can be declared in the following ways:
+
+- Buffer map of `PrimFunc`. This specifies how opaque parameters of type `T.handle` should be
+interpreted as a buffer inside the `PrimFunc`. `T.handle` represents an opaque pointer (`void *`).
+- `T.alloc_buffer` is used `S-TIR` to create and allocate a buffer.
+- `T.buffer_decl` can be used to create a buffer alias by specifying the underlying data variable to
+reuse the data from another buffer. It can also be used to reinterpret the data type of the buffer.
+`T.buffer_decl` can also be used to create a buffer alias with a different `elem_offset`.
+`elem_offset` should be handled during the lowering process.
+
+Examples of `T.buffer_decl` is shown below.
+```
+@T.prim_func
+def buffer_alias(A: T.Buffer[(16,), "float"]):
+    A_vector = T.buffer_decl([4], "float32x4", data=A.data)
+
+@T.prim_func
+def buffer_alloc():
+    A = T.buffer_decl([4, 4], "float32")
+    Allocate(A.data, [16], "float32")
+```
+
+**Allocation of buffer**
+
+In low-level TIR, `Allocate` is used to allocate a data variable with given shapes. `Allocate`
+doesn’t operate on the buffer-level. The result of `Allocate` is a data variable, which may be
+reinterpreted with a different shape or data type in buffer declaration.
+
+**Explicit `DeclBuffer` IR construct**
+
+`T.buffer_decl` is not an explicit statement in TIR - There is no such node in TIR.
+The current behavior of `TVMScriptPrinter` is to implicitly print a `T.buffer_decl` at the beginning
+of `PrimFunc` for any undefined buffers. The implicit behavior can be error-prone. In light of the
+migration, we should consider an explicit `DeclBuffer` as part of the IR. This will be furthur
+discussed in a separate RFC.
+
+**Buffer Aliasing**
+
+`T.buffer_decl` creates a buffer alias if the underlying data variable (`.data` field) overlaps with
+another buffer. Buffer created via `alloc_buffer` always do not alias. Buffer aliases do not need
+`Allocate` to create the data variable. Each data variable can be allocated via `Allocate` once.
+If a transformation would produce multiple allocations of the same buffer var (e.g. unrolling a loop
+that contains an allocation), the transform should update the allocations to be unique using
+`tvm::tir::ConvertSSA`
+
+Buffers should not alias each other unless necessary, because aliased buffers increase complexity
+for TIR transformations.  Passes that rewrite buffers should clearly indicate how aliased buffers
+are handled.  For example, when changing the underlying layout of stored elements in a buffer, all
+buffer aliases must also be updated.  While buffer aliasing is typically free at runtime, this
+imposes a cost for buffer aliasing both to compile times and development complexity. 
+
+**Discussion: When it is safe to transform a buffer**
+
+We would like to discuss some examples of when it is safe to transform a buffer w.r.t. aliasing rules:
+
+(1) reshape, (2) layout transform (e.g. swap indices), (3) compact. 
+
+(1) is fine under aliasing as long as the low level memory is shared. (2) and (3) would need more
+cares. (2) requires all the aliases be changed together. (3) requires to compute the compact buffer
+shape and then rewrite the buffer shape. This need us to take all alias into consideration and then
+rewrite their shapes together.
+
+**Generalizing buffer accesses**
+
+Previously we used `Load` and `Store` to represent low-level buffer accesses. `Load` and `Store`
+consist of data variable, data type and index, which can be directly translated to pointer cast and
+accesses in runtime. Note that data type in `Load` / `Store` can be different from the actual data
+variable type. For example,
+
+```python
+A = T.buffer_decl(shape=(16,), dtype='float')
+T.load("float4", A.data, T.ramp(4, 1, 4))
+```
+
+can be translated to
+
+```cpp
+*((float4*)(A + 4))
+```
+
+in C codegen. 
+
+`BufferLoad` and `BufferStore` themselves can not reinterpret a buffer to a different shape or
+data type. They rely on the underlying buffer object. This is the fundamental difference between

Review Comment:
   ```suggestion
   data type. They always return the data type specified on underlying buffer object. This is the fundamental difference between
   ```



##########
rfcs/0063-clarifying-buffer-declaration-and-access.md:
##########
@@ -0,0 +1,232 @@
+- Feature Name: Clarifying Buffer Declaration and Access
+- Author: Wuwei Lin (@vinx13), Eric Lunderberg (@Lunderberg)
+- Start Date: 2022-03-18
+- RFC PR: [apache/tvm-rfcs#63](https://github.com/apache/tvm-rfcs/pull/63)
+- GitHub Issue: [apache/tvm#10505](https://github.com/apache/tvm/issues/10505)
+
+# Summary 
+[summary]: #summary
+
+In https://github.com/apache/tvm/pull/9727 and
+[RFC#39](https://github.com/apache/tvm-rfcs/blob/main/rfcs/0039-buffer-physical-layout.md), we
+deprecated `Load` and `Store` to use `BufferLoad` and `BufferStore` instead in order to support
+generalized multi-dimensional physical buffer access. Here we document necessary clarifications,
+implications about the new buffer convention, as well as the post-hoc pass checklist.
+
+# Motivation
+[motivation]: #motivation
+
+The main goal of this RFC is to summarize the existing buffer convention and the IR changes in
+https://github.com/apache/tvm/pull/9727 which have a broader impact. There are no new semantics
+proposed in this RFC.
+
+# Reference - level explanation
+[reference-level-explanation]: #reference-level-explanation
+
+**What’s a buffer?**
+
+Buffer is a compile-time object that groups runtime objects(data pointer and shape variables)
+structurally. A Buffer needs to be declared and allocated before it can be used.
+
+**Declaration of buffer**
+
+Buffer can be declared in the following ways:
+
+- Buffer map of `PrimFunc`. This specifies how opaque parameters of type `T.handle` should be
+interpreted as a buffer inside the `PrimFunc`. `T.handle` represents an opaque pointer (`void *`).
+- `T.alloc_buffer` is used `S-TIR` to create and allocate a buffer.
+- `T.buffer_decl` can be used to create a buffer alias by specifying the underlying data variable to
+reuse the data from another buffer. It can also be used to reinterpret the data type of the buffer.
+`T.buffer_decl` can also be used to create a buffer alias with a different `elem_offset`.
+`elem_offset` should be handled during the lowering process.
+
+Examples of `T.buffer_decl` is shown below.
+```
+@T.prim_func
+def buffer_alias(A: T.Buffer[(16,), "float"]):
+    A_vector = T.buffer_decl([4], "float32x4", data=A.data)
+
+@T.prim_func
+def buffer_alloc():
+    A = T.buffer_decl([4, 4], "float32")
+    Allocate(A.data, [16], "float32")
+```
+
+**Allocation of buffer**
+
+In low-level TIR, `Allocate` is used to allocate a data variable with given shapes. `Allocate`
+doesn’t operate on the buffer-level. The result of `Allocate` is a data variable, which may be
+reinterpreted with a different shape or data type in buffer declaration.
+
+**Explicit `DeclBuffer` IR construct**
+
+`T.buffer_decl` is not an explicit statement in TIR - There is no such node in TIR.
+The current behavior of `TVMScriptPrinter` is to implicitly print a `T.buffer_decl` at the beginning
+of `PrimFunc` for any undefined buffers. The implicit behavior can be error-prone. In light of the
+migration, we should consider an explicit `DeclBuffer` as part of the IR. This will be furthur
+discussed in a separate RFC.
+
+**Buffer Aliasing**
+
+`T.buffer_decl` creates a buffer alias if the underlying data variable (`.data` field) overlaps with
+another buffer. Buffer created via `alloc_buffer` always do not alias. Buffer aliases do not need
+`Allocate` to create the data variable. Each data variable can be allocated via `Allocate` once.
+If a transformation would produce multiple allocations of the same buffer var (e.g. unrolling a loop
+that contains an allocation), the transform should update the allocations to be unique using
+`tvm::tir::ConvertSSA`
+
+Buffers should not alias each other unless necessary, because aliased buffers increase complexity
+for TIR transformations.  Passes that rewrite buffers should clearly indicate how aliased buffers
+are handled.  For example, when changing the underlying layout of stored elements in a buffer, all
+buffer aliases must also be updated.  While buffer aliasing is typically free at runtime, this
+imposes a cost for buffer aliasing both to compile times and development complexity. 
+
+**Discussion: When it is safe to transform a buffer**
+
+We would like to discuss some examples of when it is safe to transform a buffer w.r.t. aliasing rules:
+
+(1) reshape, (2) layout transform (e.g. swap indices), (3) compact. 
+
+(1) is fine under aliasing as long as the low level memory is shared. (2) and (3) would need more
+cares. (2) requires all the aliases be changed together. (3) requires to compute the compact buffer
+shape and then rewrite the buffer shape. This need us to take all alias into consideration and then
+rewrite their shapes together.
+
+**Generalizing buffer accesses**
+
+Previously we used `Load` and `Store` to represent low-level buffer accesses. `Load` and `Store`
+consist of data variable, data type and index, which can be directly translated to pointer cast and
+accesses in runtime. Note that data type in `Load` / `Store` can be different from the actual data
+variable type. For example,
+
+```python
+A = T.buffer_decl(shape=(16,), dtype='float')
+T.load("float4", A.data, T.ramp(4, 1, 4))
+```
+
+can be translated to
+
+```cpp
+*((float4*)(A + 4))
+```
+
+in C codegen. 
+
+`BufferLoad` and `BufferStore` themselves can not reinterpret a buffer to a different shape or
+data type. They rely on the underlying buffer object. This is the fundamental difference between
+`Load/Store` and `BufferLoad/BufferStore` that we need to deal with carefully. 
+
+Vectorized access is achieved by using `Ramp` as index in `Load/Store`. Vectorized buffer access
+via `BufferLoad`/`BufferStore` can be achieved either by using a scalar index to access a buffer
+that has a vectorized type, or by using `Ramp` as an index into a buffer that has a scalar type.
+For N-D buffer indices, it is possible that `Ramp` being used in multiple dimensions
+(e.g. `A[Ramp(...), ..., Ramp(...)]` ). In this case the number of lanes of the data type of such
+value is the product of each `Ramp`. We may want to limit `Ramp` to only the last dimension as
+multiple `Ramp` creates additional complexity.
+
+Different combinations of buffer type and index type (scalar vs. vector) are clarified in
+
+[RFC#39](https://github.com/Lunderberg/tvm-rfcs/blob/data_layout/rfcs/0039-buffer-physical-layout.md#rationale-and-alternatives),
+excerpts are the following:
+
+```python
+@T.prim_func
+def scalar_load_from_scalar_buffer(A: T.Buffer[(64,), "float32"]):
+    assert A[0].dtype == "float32"
+
+@T.prim_func
+def vector_load_from_vector_buffer(A: T.Buffer[(16,), "float32x4"]):
+    assert A[0].dtype == "float32x4"
+
+@T.prim_func
+def vector_load_from_vector_buffer(A: T.Buffer[(16,), "float32x4"]):
+    A_vector_2 = T.buffer_decl([32], "float32x2", data=A.data)
+    assert A[0].dtype == "float32x4"
+    assert A_vector_2[0].dtype == "float32x2"
+
+@T.prim_func
+def vector_load_from_scalar_buffer_option1(A: T.Buffer[(64,), "float32"]):
+    assert A[T.ramp(0, 1, 4)].dtype == "float32x4"
+
+@T.prim_func
+def vector_load_from_scalar_buffer_option2(A: T.Buffer[(64,), "float32"]):
+    A_vector = T.buffer_decl([16], "float32x4", data=A.data)
+    assert A_vector[0].dtype == "float32x4"
+
+@T.prim_func
+def scalar_load_from_vector_buffer(A: T.Buffer[(16,), "float32x4"]):
+    A_scalar = T.buffer_decl([64], "float32", data=A.data)
+    assert A_scalar[0].dtype == "float32"
+
+#multiple dimensional buffer accesses
+@T.prim_func
+def nd_scalar_load_from_scalar_buffer(A: T.Buffer[(64, 64,), "float32"]):
+    assert A[0, 0].dtype == "float32"
+
+@T.prim_func
+def nd_vector_load_from_scalar_buffer(A: T.Buffer[(64,64), "float32"]):
+    assert A[0, T.ramp(0, 1, 4)].dtype == "float32x4"
+```
+
+In rare cases, vector index can be used to access a vector buffer. We may leave this usage as
+undefinedUnless we have a clear use case.

Review Comment:
   ```suggestion
   undefined until we have a clear use case.
   ```



##########
rfcs/0063-clarifying-buffer-declaration-and-access.md:
##########
@@ -0,0 +1,232 @@
+- Feature Name: Clarifying Buffer Declaration and Access
+- Author: Wuwei Lin (@vinx13), Eric Lunderberg (@Lunderberg)
+- Start Date: 2022-03-18
+- RFC PR: [apache/tvm-rfcs#63](https://github.com/apache/tvm-rfcs/pull/63)
+- GitHub Issue: [apache/tvm#10505](https://github.com/apache/tvm/issues/10505)
+
+# Summary 
+[summary]: #summary
+
+In https://github.com/apache/tvm/pull/9727 and
+[RFC#39](https://github.com/apache/tvm-rfcs/blob/main/rfcs/0039-buffer-physical-layout.md), we
+deprecated `Load` and `Store` to use `BufferLoad` and `BufferStore` instead in order to support
+generalized multi-dimensional physical buffer access. Here we document necessary clarifications,
+implications about the new buffer convention, as well as the post-hoc pass checklist.
+
+# Motivation
+[motivation]: #motivation
+
+The main goal of this RFC is to summarize the existing buffer convention and the IR changes in
+https://github.com/apache/tvm/pull/9727 which have a broader impact. There are no new semantics
+proposed in this RFC.
+
+# Reference - level explanation
+[reference-level-explanation]: #reference-level-explanation
+
+**What’s a buffer?**
+
+Buffer is a compile-time object that groups runtime objects(data pointer and shape variables)
+structurally. A Buffer needs to be declared and allocated before it can be used.
+
+**Declaration of buffer**
+
+Buffer can be declared in the following ways:
+
+- Buffer map of `PrimFunc`. This specifies how opaque parameters of type `T.handle` should be
+interpreted as a buffer inside the `PrimFunc`. `T.handle` represents an opaque pointer (`void *`).
+- `T.alloc_buffer` is used `S-TIR` to create and allocate a buffer.
+- `T.buffer_decl` can be used to create a buffer alias by specifying the underlying data variable to
+reuse the data from another buffer. It can also be used to reinterpret the data type of the buffer.
+`T.buffer_decl` can also be used to create a buffer alias with a different `elem_offset`.
+`elem_offset` should be handled during the lowering process.
+
+Examples of `T.buffer_decl` is shown below.
+```
+@T.prim_func
+def buffer_alias(A: T.Buffer[(16,), "float"]):
+    A_vector = T.buffer_decl([4], "float32x4", data=A.data)
+
+@T.prim_func
+def buffer_alloc():
+    A = T.buffer_decl([4, 4], "float32")
+    Allocate(A.data, [16], "float32")
+```
+
+**Allocation of buffer**
+
+In low-level TIR, `Allocate` is used to allocate a data variable with given shapes. `Allocate`
+doesn’t operate on the buffer-level. The result of `Allocate` is a data variable, which may be
+reinterpreted with a different shape or data type in buffer declaration.
+
+**Explicit `DeclBuffer` IR construct**
+
+`T.buffer_decl` is not an explicit statement in TIR - There is no such node in TIR.
+The current behavior of `TVMScriptPrinter` is to implicitly print a `T.buffer_decl` at the beginning
+of `PrimFunc` for any undefined buffers. The implicit behavior can be error-prone. In light of the
+migration, we should consider an explicit `DeclBuffer` as part of the IR. This will be furthur
+discussed in a separate RFC.
+
+**Buffer Aliasing**
+
+`T.buffer_decl` creates a buffer alias if the underlying data variable (`.data` field) overlaps with
+another buffer. Buffer created via `alloc_buffer` always do not alias. Buffer aliases do not need
+`Allocate` to create the data variable. Each data variable can be allocated via `Allocate` once.
+If a transformation would produce multiple allocations of the same buffer var (e.g. unrolling a loop
+that contains an allocation), the transform should update the allocations to be unique using
+`tvm::tir::ConvertSSA`
+
+Buffers should not alias each other unless necessary, because aliased buffers increase complexity
+for TIR transformations.  Passes that rewrite buffers should clearly indicate how aliased buffers
+are handled.  For example, when changing the underlying layout of stored elements in a buffer, all
+buffer aliases must also be updated.  While buffer aliasing is typically free at runtime, this
+imposes a cost for buffer aliasing both to compile times and development complexity. 
+
+**Discussion: When it is safe to transform a buffer**
+
+We would like to discuss some examples of when it is safe to transform a buffer w.r.t. aliasing rules:
+
+(1) reshape, (2) layout transform (e.g. swap indices), (3) compact. 
+
+(1) is fine under aliasing as long as the low level memory is shared. (2) and (3) would need more

Review Comment:
   1. it is safe to reshape a buffer alias as long as the low-level memory is shared
   
   > as long as the low level memory is shared
   could you clarify this pt?
   
   



##########
rfcs/0063-clarifying-buffer-declaration-and-access.md:
##########
@@ -0,0 +1,232 @@
+- Feature Name: Clarifying Buffer Declaration and Access
+- Author: Wuwei Lin (@vinx13), Eric Lunderberg (@Lunderberg)
+- Start Date: 2022-03-18
+- RFC PR: [apache/tvm-rfcs#63](https://github.com/apache/tvm-rfcs/pull/63)
+- GitHub Issue: [apache/tvm#10505](https://github.com/apache/tvm/issues/10505)
+
+# Summary 
+[summary]: #summary
+
+In https://github.com/apache/tvm/pull/9727 and
+[RFC#39](https://github.com/apache/tvm-rfcs/blob/main/rfcs/0039-buffer-physical-layout.md), we
+deprecated `Load` and `Store` to use `BufferLoad` and `BufferStore` instead in order to support
+generalized multi-dimensional physical buffer access. Here we document necessary clarifications,
+implications about the new buffer convention, as well as the post-hoc pass checklist.
+
+# Motivation
+[motivation]: #motivation
+
+The main goal of this RFC is to summarize the existing buffer convention and the IR changes in
+https://github.com/apache/tvm/pull/9727 which have a broader impact. There are no new semantics
+proposed in this RFC.
+
+# Reference - level explanation
+[reference-level-explanation]: #reference-level-explanation
+
+**What’s a buffer?**
+
+Buffer is a compile-time object that groups runtime objects(data pointer and shape variables)
+structurally. A Buffer needs to be declared and allocated before it can be used.
+
+**Declaration of buffer**
+
+Buffer can be declared in the following ways:
+
+- Buffer map of `PrimFunc`. This specifies how opaque parameters of type `T.handle` should be
+interpreted as a buffer inside the `PrimFunc`. `T.handle` represents an opaque pointer (`void *`).
+- `T.alloc_buffer` is used `S-TIR` to create and allocate a buffer.
+- `T.buffer_decl` can be used to create a buffer alias by specifying the underlying data variable to
+reuse the data from another buffer. It can also be used to reinterpret the data type of the buffer.
+`T.buffer_decl` can also be used to create a buffer alias with a different `elem_offset`.
+`elem_offset` should be handled during the lowering process.
+
+Examples of `T.buffer_decl` is shown below.
+```
+@T.prim_func
+def buffer_alias(A: T.Buffer[(16,), "float"]):
+    A_vector = T.buffer_decl([4], "float32x4", data=A.data)
+
+@T.prim_func
+def buffer_alloc():
+    A = T.buffer_decl([4, 4], "float32")
+    Allocate(A.data, [16], "float32")
+```
+
+**Allocation of buffer**
+
+In low-level TIR, `Allocate` is used to allocate a data variable with given shapes. `Allocate`
+doesn’t operate on the buffer-level. The result of `Allocate` is a data variable, which may be
+reinterpreted with a different shape or data type in buffer declaration.
+
+**Explicit `DeclBuffer` IR construct**
+
+`T.buffer_decl` is not an explicit statement in TIR - There is no such node in TIR.
+The current behavior of `TVMScriptPrinter` is to implicitly print a `T.buffer_decl` at the beginning
+of `PrimFunc` for any undefined buffers. The implicit behavior can be error-prone. In light of the
+migration, we should consider an explicit `DeclBuffer` as part of the IR. This will be furthur
+discussed in a separate RFC.
+
+**Buffer Aliasing**
+
+`T.buffer_decl` creates a buffer alias if the underlying data variable (`.data` field) overlaps with
+another buffer. Buffer created via `alloc_buffer` always do not alias. Buffer aliases do not need
+`Allocate` to create the data variable. Each data variable can be allocated via `Allocate` once.
+If a transformation would produce multiple allocations of the same buffer var (e.g. unrolling a loop
+that contains an allocation), the transform should update the allocations to be unique using
+`tvm::tir::ConvertSSA`
+
+Buffers should not alias each other unless necessary, because aliased buffers increase complexity
+for TIR transformations.  Passes that rewrite buffers should clearly indicate how aliased buffers
+are handled.  For example, when changing the underlying layout of stored elements in a buffer, all
+buffer aliases must also be updated.  While buffer aliasing is typically free at runtime, this
+imposes a cost for buffer aliasing both to compile times and development complexity. 
+
+**Discussion: When it is safe to transform a buffer**
+
+We would like to discuss some examples of when it is safe to transform a buffer w.r.t. aliasing rules:
+
+(1) reshape, (2) layout transform (e.g. swap indices), (3) compact. 
+
+(1) is fine under aliasing as long as the low level memory is shared. (2) and (3) would need more
+cares. (2) requires all the aliases be changed together. (3) requires to compute the compact buffer
+shape and then rewrite the buffer shape. This need us to take all alias into consideration and then
+rewrite their shapes together.
+
+**Generalizing buffer accesses**
+
+Previously we used `Load` and `Store` to represent low-level buffer accesses. `Load` and `Store`
+consist of data variable, data type and index, which can be directly translated to pointer cast and
+accesses in runtime. Note that data type in `Load` / `Store` can be different from the actual data
+variable type. For example,
+
+```python
+A = T.buffer_decl(shape=(16,), dtype='float')
+T.load("float4", A.data, T.ramp(4, 1, 4))
+```
+
+can be translated to
+
+```cpp
+*((float4*)(A + 4))
+```
+
+in C codegen. 
+
+`BufferLoad` and `BufferStore` themselves can not reinterpret a buffer to a different shape or
+data type. They rely on the underlying buffer object. This is the fundamental difference between
+`Load/Store` and `BufferLoad/BufferStore` that we need to deal with carefully. 
+
+Vectorized access is achieved by using `Ramp` as index in `Load/Store`. Vectorized buffer access
+via `BufferLoad`/`BufferStore` can be achieved either by using a scalar index to access a buffer
+that has a vectorized type, or by using `Ramp` as an index into a buffer that has a scalar type.
+For N-D buffer indices, it is possible that `Ramp` being used in multiple dimensions
+(e.g. `A[Ramp(...), ..., Ramp(...)]` ). In this case the number of lanes of the data type of such
+value is the product of each `Ramp`. We may want to limit `Ramp` to only the last dimension as
+multiple `Ramp` creates additional complexity.
+
+Different combinations of buffer type and index type (scalar vs. vector) are clarified in
+
+[RFC#39](https://github.com/Lunderberg/tvm-rfcs/blob/data_layout/rfcs/0039-buffer-physical-layout.md#rationale-and-alternatives),
+excerpts are the following:
+
+```python
+@T.prim_func
+def scalar_load_from_scalar_buffer(A: T.Buffer[(64,), "float32"]):
+    assert A[0].dtype == "float32"
+
+@T.prim_func
+def vector_load_from_vector_buffer(A: T.Buffer[(16,), "float32x4"]):
+    assert A[0].dtype == "float32x4"
+
+@T.prim_func
+def vector_load_from_vector_buffer(A: T.Buffer[(16,), "float32x4"]):
+    A_vector_2 = T.buffer_decl([32], "float32x2", data=A.data)
+    assert A[0].dtype == "float32x4"
+    assert A_vector_2[0].dtype == "float32x2"
+
+@T.prim_func
+def vector_load_from_scalar_buffer_option1(A: T.Buffer[(64,), "float32"]):
+    assert A[T.ramp(0, 1, 4)].dtype == "float32x4"
+
+@T.prim_func
+def vector_load_from_scalar_buffer_option2(A: T.Buffer[(64,), "float32"]):
+    A_vector = T.buffer_decl([16], "float32x4", data=A.data)
+    assert A_vector[0].dtype == "float32x4"
+
+@T.prim_func
+def scalar_load_from_vector_buffer(A: T.Buffer[(16,), "float32x4"]):
+    A_scalar = T.buffer_decl([64], "float32", data=A.data)
+    assert A_scalar[0].dtype == "float32"
+
+#multiple dimensional buffer accesses
+@T.prim_func
+def nd_scalar_load_from_scalar_buffer(A: T.Buffer[(64, 64,), "float32"]):
+    assert A[0, 0].dtype == "float32"
+
+@T.prim_func
+def nd_vector_load_from_scalar_buffer(A: T.Buffer[(64,64), "float32"]):
+    assert A[0, T.ramp(0, 1, 4)].dtype == "float32x4"
+```
+
+In rare cases, vector index can be used to access a vector buffer. We may leave this usage as

Review Comment:
   ```suggestion
   In rare cases, vector index can be used to access a vector buffer. We leave this usage as
   ```



##########
rfcs/0063-clarifying-buffer-declaration-and-access.md:
##########
@@ -0,0 +1,232 @@
+- Feature Name: Clarifying Buffer Declaration and Access
+- Author: Wuwei Lin (@vinx13), Eric Lunderberg (@Lunderberg)
+- Start Date: 2022-03-18
+- RFC PR: [apache/tvm-rfcs#63](https://github.com/apache/tvm-rfcs/pull/63)
+- GitHub Issue: [apache/tvm#10505](https://github.com/apache/tvm/issues/10505)
+
+# Summary 
+[summary]: #summary
+
+In https://github.com/apache/tvm/pull/9727 and
+[RFC#39](https://github.com/apache/tvm-rfcs/blob/main/rfcs/0039-buffer-physical-layout.md), we
+deprecated `Load` and `Store` to use `BufferLoad` and `BufferStore` instead in order to support
+generalized multi-dimensional physical buffer access. Here we document necessary clarifications,
+implications about the new buffer convention, as well as the post-hoc pass checklist.
+
+# Motivation
+[motivation]: #motivation
+
+The main goal of this RFC is to summarize the existing buffer convention and the IR changes in
+https://github.com/apache/tvm/pull/9727 which have a broader impact. There are no new semantics
+proposed in this RFC.
+
+# Reference - level explanation
+[reference-level-explanation]: #reference-level-explanation
+
+**What’s a buffer?**
+
+Buffer is a compile-time object that groups runtime objects(data pointer and shape variables)
+structurally. A Buffer needs to be declared and allocated before it can be used.
+
+**Declaration of buffer**
+
+Buffer can be declared in the following ways:
+
+- Buffer map of `PrimFunc`. This specifies how opaque parameters of type `T.handle` should be
+interpreted as a buffer inside the `PrimFunc`. `T.handle` represents an opaque pointer (`void *`).
+- `T.alloc_buffer` is used `S-TIR` to create and allocate a buffer.
+- `T.buffer_decl` can be used to create a buffer alias by specifying the underlying data variable to
+reuse the data from another buffer. It can also be used to reinterpret the data type of the buffer.
+`T.buffer_decl` can also be used to create a buffer alias with a different `elem_offset`.
+`elem_offset` should be handled during the lowering process.
+
+Examples of `T.buffer_decl` is shown below.
+```
+@T.prim_func
+def buffer_alias(A: T.Buffer[(16,), "float"]):
+    A_vector = T.buffer_decl([4], "float32x4", data=A.data)
+
+@T.prim_func
+def buffer_alloc():
+    A = T.buffer_decl([4, 4], "float32")
+    Allocate(A.data, [16], "float32")
+```
+
+**Allocation of buffer**
+
+In low-level TIR, `Allocate` is used to allocate a data variable with given shapes. `Allocate`
+doesn’t operate on the buffer-level. The result of `Allocate` is a data variable, which may be
+reinterpreted with a different shape or data type in buffer declaration.
+
+**Explicit `DeclBuffer` IR construct**
+
+`T.buffer_decl` is not an explicit statement in TIR - There is no such node in TIR.
+The current behavior of `TVMScriptPrinter` is to implicitly print a `T.buffer_decl` at the beginning
+of `PrimFunc` for any undefined buffers. The implicit behavior can be error-prone. In light of the
+migration, we should consider an explicit `DeclBuffer` as part of the IR. This will be furthur
+discussed in a separate RFC.
+
+**Buffer Aliasing**
+
+`T.buffer_decl` creates a buffer alias if the underlying data variable (`.data` field) overlaps with
+another buffer. Buffer created via `alloc_buffer` always do not alias. Buffer aliases do not need
+`Allocate` to create the data variable. Each data variable can be allocated via `Allocate` once.
+If a transformation would produce multiple allocations of the same buffer var (e.g. unrolling a loop
+that contains an allocation), the transform should update the allocations to be unique using
+`tvm::tir::ConvertSSA`
+
+Buffers should not alias each other unless necessary, because aliased buffers increase complexity
+for TIR transformations.  Passes that rewrite buffers should clearly indicate how aliased buffers
+are handled.  For example, when changing the underlying layout of stored elements in a buffer, all
+buffer aliases must also be updated.  While buffer aliasing is typically free at runtime, this
+imposes a cost for buffer aliasing both to compile times and development complexity. 
+
+**Discussion: When it is safe to transform a buffer**
+
+We would like to discuss some examples of when it is safe to transform a buffer w.r.t. aliasing rules:
+
+(1) reshape, (2) layout transform (e.g. swap indices), (3) compact. 
+
+(1) is fine under aliasing as long as the low level memory is shared. (2) and (3) would need more
+cares. (2) requires all the aliases be changed together. (3) requires to compute the compact buffer
+shape and then rewrite the buffer shape. This need us to take all alias into consideration and then
+rewrite their shapes together.
+
+**Generalizing buffer accesses**
+
+Previously we used `Load` and `Store` to represent low-level buffer accesses. `Load` and `Store`
+consist of data variable, data type and index, which can be directly translated to pointer cast and
+accesses in runtime. Note that data type in `Load` / `Store` can be different from the actual data
+variable type. For example,
+
+```python
+A = T.buffer_decl(shape=(16,), dtype='float')
+T.load("float4", A.data, T.ramp(4, 1, 4))
+```
+
+can be translated to
+
+```cpp
+*((float4*)(A + 4))
+```
+
+in C codegen. 
+
+`BufferLoad` and `BufferStore` themselves can not reinterpret a buffer to a different shape or
+data type. They rely on the underlying buffer object. This is the fundamental difference between
+`Load/Store` and `BufferLoad/BufferStore` that we need to deal with carefully. 
+
+Vectorized access is achieved by using `Ramp` as index in `Load/Store`. Vectorized buffer access
+via `BufferLoad`/`BufferStore` can be achieved either by using a scalar index to access a buffer
+that has a vectorized type, or by using `Ramp` as an index into a buffer that has a scalar type.
+For N-D buffer indices, it is possible that `Ramp` being used in multiple dimensions
+(e.g. `A[Ramp(...), ..., Ramp(...)]` ). In this case the number of lanes of the data type of such
+value is the product of each `Ramp`. We may want to limit `Ramp` to only the last dimension as
+multiple `Ramp` creates additional complexity.
+
+Different combinations of buffer type and index type (scalar vs. vector) are clarified in
+
+[RFC#39](https://github.com/Lunderberg/tvm-rfcs/blob/data_layout/rfcs/0039-buffer-physical-layout.md#rationale-and-alternatives),
+excerpts are the following:
+
+```python
+@T.prim_func
+def scalar_load_from_scalar_buffer(A: T.Buffer[(64,), "float32"]):
+    assert A[0].dtype == "float32"
+
+@T.prim_func
+def vector_load_from_vector_buffer(A: T.Buffer[(16,), "float32x4"]):
+    assert A[0].dtype == "float32x4"
+
+@T.prim_func
+def vector_load_from_vector_buffer(A: T.Buffer[(16,), "float32x4"]):
+    A_vector_2 = T.buffer_decl([32], "float32x2", data=A.data)
+    assert A[0].dtype == "float32x4"
+    assert A_vector_2[0].dtype == "float32x2"
+
+@T.prim_func
+def vector_load_from_scalar_buffer_option1(A: T.Buffer[(64,), "float32"]):
+    assert A[T.ramp(0, 1, 4)].dtype == "float32x4"
+
+@T.prim_func
+def vector_load_from_scalar_buffer_option2(A: T.Buffer[(64,), "float32"]):
+    A_vector = T.buffer_decl([16], "float32x4", data=A.data)
+    assert A_vector[0].dtype == "float32x4"
+
+@T.prim_func
+def scalar_load_from_vector_buffer(A: T.Buffer[(16,), "float32x4"]):
+    A_scalar = T.buffer_decl([64], "float32", data=A.data)
+    assert A_scalar[0].dtype == "float32"
+
+#multiple dimensional buffer accesses
+@T.prim_func
+def nd_scalar_load_from_scalar_buffer(A: T.Buffer[(64, 64,), "float32"]):
+    assert A[0, 0].dtype == "float32"
+
+@T.prim_func
+def nd_vector_load_from_scalar_buffer(A: T.Buffer[(64,64), "float32"]):
+    assert A[0, T.ramp(0, 1, 4)].dtype == "float32x4"
+```
+
+In rare cases, vector index can be used to access a vector buffer. We may leave this usage as
+undefinedUnless we have a clear use case.
+
+**VectorBufferRewrite**
+
+In some backend like SPIR-V where runtime pointer casts are not available, even between types that
+differ only in the number of lanes (e.g. `float16` and `float16x4.`) `VectorTypeRewriter` will be
+used to rewrite the buffer to a vector type. (VectorBufferRewrite rewrites the buffer from
+`vector_load_from_scalar_buffer` into `scalar_load_from_vector_buffer` in the above example).
+
+**Removing pre-flattened buffer**

Review Comment:
   could we give short context on why `preflattened_buffer_map` exists in the first place?



##########
rfcs/0063-clarifying-buffer-declaration-and-access.md:
##########
@@ -0,0 +1,232 @@
+- Feature Name: Clarifying Buffer Declaration and Access
+- Author: Wuwei Lin (@vinx13), Eric Lunderberg (@Lunderberg)
+- Start Date: 2022-03-18
+- RFC PR: [apache/tvm-rfcs#63](https://github.com/apache/tvm-rfcs/pull/63)
+- GitHub Issue: [apache/tvm#10505](https://github.com/apache/tvm/issues/10505)
+
+# Summary 
+[summary]: #summary
+
+In https://github.com/apache/tvm/pull/9727 and
+[RFC#39](https://github.com/apache/tvm-rfcs/blob/main/rfcs/0039-buffer-physical-layout.md), we
+deprecated `Load` and `Store` to use `BufferLoad` and `BufferStore` instead in order to support
+generalized multi-dimensional physical buffer access. Here we document necessary clarifications,
+implications about the new buffer convention, as well as the post-hoc pass checklist.
+
+# Motivation
+[motivation]: #motivation
+
+The main goal of this RFC is to summarize the existing buffer convention and the IR changes in
+https://github.com/apache/tvm/pull/9727 which have a broader impact. There are no new semantics
+proposed in this RFC.
+
+# Reference - level explanation
+[reference-level-explanation]: #reference-level-explanation
+
+**What’s a buffer?**
+
+Buffer is a compile-time object that groups runtime objects(data pointer and shape variables)
+structurally. A Buffer needs to be declared and allocated before it can be used.
+
+**Declaration of buffer**
+
+Buffer can be declared in the following ways:
+
+- Buffer map of `PrimFunc`. This specifies how opaque parameters of type `T.handle` should be
+interpreted as a buffer inside the `PrimFunc`. `T.handle` represents an opaque pointer (`void *`).
+- `T.alloc_buffer` is used `S-TIR` to create and allocate a buffer.
+- `T.buffer_decl` can be used to create a buffer alias by specifying the underlying data variable to
+reuse the data from another buffer. It can also be used to reinterpret the data type of the buffer.
+`T.buffer_decl` can also be used to create a buffer alias with a different `elem_offset`.
+`elem_offset` should be handled during the lowering process.
+
+Examples of `T.buffer_decl` is shown below.
+```
+@T.prim_func
+def buffer_alias(A: T.Buffer[(16,), "float"]):
+    A_vector = T.buffer_decl([4], "float32x4", data=A.data)
+
+@T.prim_func
+def buffer_alloc():
+    A = T.buffer_decl([4, 4], "float32")
+    Allocate(A.data, [16], "float32")
+```
+
+**Allocation of buffer**
+
+In low-level TIR, `Allocate` is used to allocate a data variable with given shapes. `Allocate`
+doesn’t operate on the buffer-level. The result of `Allocate` is a data variable, which may be
+reinterpreted with a different shape or data type in buffer declaration.
+
+**Explicit `DeclBuffer` IR construct**
+
+`T.buffer_decl` is not an explicit statement in TIR - There is no such node in TIR.
+The current behavior of `TVMScriptPrinter` is to implicitly print a `T.buffer_decl` at the beginning
+of `PrimFunc` for any undefined buffers. The implicit behavior can be error-prone. In light of the
+migration, we should consider an explicit `DeclBuffer` as part of the IR. This will be furthur
+discussed in a separate RFC.
+
+**Buffer Aliasing**
+
+`T.buffer_decl` creates a buffer alias if the underlying data variable (`.data` field) overlaps with
+another buffer. Buffer created via `alloc_buffer` always do not alias. Buffer aliases do not need
+`Allocate` to create the data variable. Each data variable can be allocated via `Allocate` once.
+If a transformation would produce multiple allocations of the same buffer var (e.g. unrolling a loop
+that contains an allocation), the transform should update the allocations to be unique using
+`tvm::tir::ConvertSSA`
+
+Buffers should not alias each other unless necessary, because aliased buffers increase complexity
+for TIR transformations.  Passes that rewrite buffers should clearly indicate how aliased buffers
+are handled.  For example, when changing the underlying layout of stored elements in a buffer, all
+buffer aliases must also be updated.  While buffer aliasing is typically free at runtime, this
+imposes a cost for buffer aliasing both to compile times and development complexity. 
+
+**Discussion: When it is safe to transform a buffer**
+
+We would like to discuss some examples of when it is safe to transform a buffer w.r.t. aliasing rules:
+
+(1) reshape, (2) layout transform (e.g. swap indices), (3) compact. 
+
+(1) is fine under aliasing as long as the low level memory is shared. (2) and (3) would need more
+cares. (2) requires all the aliases be changed together. (3) requires to compute the compact buffer
+shape and then rewrite the buffer shape. This need us to take all alias into consideration and then
+rewrite their shapes together.
+
+**Generalizing buffer accesses**
+
+Previously we used `Load` and `Store` to represent low-level buffer accesses. `Load` and `Store`
+consist of data variable, data type and index, which can be directly translated to pointer cast and
+accesses in runtime. Note that data type in `Load` / `Store` can be different from the actual data
+variable type. For example,
+
+```python
+A = T.buffer_decl(shape=(16,), dtype='float')
+T.load("float4", A.data, T.ramp(4, 1, 4))
+```
+
+can be translated to
+
+```cpp
+*((float4*)(A + 4))
+```
+
+in C codegen. 
+
+`BufferLoad` and `BufferStore` themselves can not reinterpret a buffer to a different shape or
+data type. They rely on the underlying buffer object. This is the fundamental difference between
+`Load/Store` and `BufferLoad/BufferStore` that we need to deal with carefully. 
+
+Vectorized access is achieved by using `Ramp` as index in `Load/Store`. Vectorized buffer access
+via `BufferLoad`/`BufferStore` can be achieved either by using a scalar index to access a buffer
+that has a vectorized type, or by using `Ramp` as an index into a buffer that has a scalar type.
+For N-D buffer indices, it is possible that `Ramp` being used in multiple dimensions
+(e.g. `A[Ramp(...), ..., Ramp(...)]` ). In this case the number of lanes of the data type of such
+value is the product of each `Ramp`. We may want to limit `Ramp` to only the last dimension as
+multiple `Ramp` creates additional complexity.
+
+Different combinations of buffer type and index type (scalar vs. vector) are clarified in
+
+[RFC#39](https://github.com/Lunderberg/tvm-rfcs/blob/data_layout/rfcs/0039-buffer-physical-layout.md#rationale-and-alternatives),
+excerpts are the following:
+
+```python
+@T.prim_func
+def scalar_load_from_scalar_buffer(A: T.Buffer[(64,), "float32"]):
+    assert A[0].dtype == "float32"
+
+@T.prim_func
+def vector_load_from_vector_buffer(A: T.Buffer[(16,), "float32x4"]):
+    assert A[0].dtype == "float32x4"
+
+@T.prim_func
+def vector_load_from_vector_buffer(A: T.Buffer[(16,), "float32x4"]):
+    A_vector_2 = T.buffer_decl([32], "float32x2", data=A.data)
+    assert A[0].dtype == "float32x4"
+    assert A_vector_2[0].dtype == "float32x2"
+
+@T.prim_func
+def vector_load_from_scalar_buffer_option1(A: T.Buffer[(64,), "float32"]):
+    assert A[T.ramp(0, 1, 4)].dtype == "float32x4"
+
+@T.prim_func
+def vector_load_from_scalar_buffer_option2(A: T.Buffer[(64,), "float32"]):
+    A_vector = T.buffer_decl([16], "float32x4", data=A.data)
+    assert A_vector[0].dtype == "float32x4"
+
+@T.prim_func
+def scalar_load_from_vector_buffer(A: T.Buffer[(16,), "float32x4"]):
+    A_scalar = T.buffer_decl([64], "float32", data=A.data)
+    assert A_scalar[0].dtype == "float32"
+
+#multiple dimensional buffer accesses
+@T.prim_func
+def nd_scalar_load_from_scalar_buffer(A: T.Buffer[(64, 64,), "float32"]):
+    assert A[0, 0].dtype == "float32"
+
+@T.prim_func
+def nd_vector_load_from_scalar_buffer(A: T.Buffer[(64,64), "float32"]):
+    assert A[0, T.ramp(0, 1, 4)].dtype == "float32x4"
+```
+
+In rare cases, vector index can be used to access a vector buffer. We may leave this usage as
+undefinedUnless we have a clear use case.
+
+**VectorBufferRewrite**
+
+In some backend like SPIR-V where runtime pointer casts are not available, even between types that
+differ only in the number of lanes (e.g. `float16` and `float16x4.`) `VectorTypeRewriter` will be
+used to rewrite the buffer to a vector type. (VectorBufferRewrite rewrites the buffer from
+`vector_load_from_scalar_buffer` into `scalar_load_from_vector_buffer` in the above example).
+
+**Removing pre-flattened buffer**
+
+Buffer information before flattening are necessary during compilation. They specify the calling
+convention of `PrimFunc` and are translated to assertions of buffer shapes, strides, etc. in
+runtime.
+
+During the lowering process, although buffer accesses inside `PrimFunc` are flattened to match
+physical buffer dimensions, the calling convention of the `PrimFunc` are kept unchanged - It still
+expect the parameter to have multi-dimensional logical buffer shape. Therefore, we would like to
+unify `preflattened_buffer_map` and `buffer_map`. `buffer_map` should be kept unchanged during
+buffer flattening. Instead, we declare an aliasing buffer as the flattened buffer after flattening.
+For example, after flattening, the TIR will look like
+
+```python
+def fn(X: T.Buffer([2, 3], "float32"):
+  X_flattened = T.decl_buffer(X.data, [6], "float32")
+  for i in grid(6):
+    X_flattened[i] = ....
+```
+
+#Pass Checklist
+
+Here are a list of TIR passes that can be impacted significantly when migrating from `Load/Store` to
+`BufferLoad/BufferStore`.
+
+- `StorageFlatten` / `FlattenBuffer`: These passes flatten buffer to physical dimensions. As
+discussed above, they should create flattened buffer via `T.buffer_decl` while keeping `buffer_map`
+unchanged (see the discussion in *Removing pre-flattened buffer* section). Any subsequent passes
+that rewrite buffer, such as, `InjectDoubleBuffer`, `InjectVirtualThread` , should operate on
+physical buffers and should not changing the number of buffer dimensions. `Allocate` after
+flattening will reflect physical buffer dimensions. Alternatively, these passes could be made
+simpler by moving them to occur before the buffer is flattened.  For example, implementing
+`InjectDoubleBuffer` by changing the shape to `[2, *old_shape]`, and accessing using `[i%2,
+*old_indices]`.  That would limit the size/stride handling to occur only during  buffer flattening.
+- VectorizeLoop: This pass should rewrite buffer indices to `Ramp` for vectorized accesses, should
+consider limiting vector index as the last dimension.
+- `StorageRewrite`: This pass should be extended to handle N-D physical buffer.
+- `VectorTypeRewriter` should also consider limiting vector index as the last dimension.
+- `MakePackedAPI`: This pass adds additional parameters (variables) to `PrimFunc` according to the
+FFI calling convention. These variables can no longer be used in `Load` directly. Buffer should be
+declared and then `BufferLoad` should be used  to access values of these parameters.
+- `LowerThreadAllreduce`: This pass is involved with a few buffer rewriting. Need to check buffer
+declarations / accesses follow the new convention here.
+
+# Conclusion and Key takeaways
+- `buffer_decl` creates buffer alias, it is important to consider implications and use
+`buffer_decl` properly. Passes that transform buffers should consider how to buffer alias.
+Therefore we should be able to have a unified method called `T.buffer_decl` in both TIR and
+TVMScript.
+- There are several way for buffer definition, `T.buffer_decl, T.match_buffer, T.alloc_buffer`.
+- `BufferLoad/BufferStore` can be generalized to allow `Ramp` as part of the index.
+- `buffer_decl` can be an alternative of `preflattened_buffer_map`.

Review Comment:
   ```suggestion
   - `buffer_decl` is going to be used to declare flattened Buffer aliases, and  `preflattened_buffer_map` will be removed.
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org