You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@tvm.apache.org by GitBox <gi...@apache.org> on 2022/02/04 01:45:45 UTC
[GitHub] [tvm-rfcs] areusch commented on a change in pull request #46: Module Based Model Runtime for AOT

areusch commented on a change in pull request #46:
URL: https://github.com/apache/tvm-rfcs/pull/46#discussion_r799099853



##########
File path: rfcs/0046-module-based-model-runtime-for-aot.md
##########
@@ -0,0 +1,348 @@
+# Module-based Model Runtime Interface for AOT
+
+- Feature Name: module_based_model_runtime_for_aot
+- Start Date: 2021-09-17
+- RFC PR: [apache/tvm-rfcs#0046](https://github.com/apache/tvm-rfcs/pull/0046)
+- GitHub Issue: [apache/tvm#0000](https://github.com/apache/tvm/issues/0000)
+
+# **Summary**
+
+This RFC describes a [Module-based Model Runtime
+interface](https://discuss.tvm.apache.org/t/discuss-module-based-model-runtime-interface/5025) for
+the [Ahead-of-Time Executor](https://discuss.tvm.apache.org/t/implementing-aot-in-tvm/9206), thereby
+enabling its use from the TVM C++ Runtime.
+
+# **Motivation**
+
+The microTVM project has made significant progress towards an Ahead-of-Time Executor for compiled
+Relay models. At the time of writing, it's now possible to codegen a TIR function which executes
+Relay models that have known shapes, don't have graph-level control flow, and execute only on the
+CPU device. Right now, the C runtime is the only such runtime environment which can interact with
+this generated code. However, significant interest exists in enabling the C++ runtime to use the
+Ahead-of-Time executor.
+
+# **Guide-level explanation**
+
+Users select the AOT executor at compile time through the traditional GraphExecutor compilation flow
+(e.g. `[tvm.relay.build](http://tvm.relay.build)`) by including `--executor=aot` in the Target
+[1]. The return value of `tvm.relay.build` in this case is an `AotExecutorFactory` Module
+object. Users instantiate the AOT executor via `AotExecutorFactory` as they do with `GraphExecutor`:
+
+```bash
+ir_mod = tvm.parser.fromtext("""\
+      #[version = "0.0.5"]
+      def @main(%a : Tensor[(1, 2), uint8], %b : Tensor[(1, 2), uint8]) {
+          %0 = %a + %b;
+          %0
+      }"""
+    )
+
+with PassConfig(opt_level=3):
+  factory : AotExecutorFactory = tvm.relay.build(
+       ir_mod, "llvm -executor=aot", module_name="my_mod")
+
+aot_executor : AotExecutor = factory["my_mod"](tvm.cpu(0))
+```
+
+`AotExecutor` supports the traditional Module-Based Model Runtime Interface and can be used as a
+user normally would `GraphExecutor`:
+
+```bash
+aot_executor.set_input("a", tvm.nd.array(np.ndarray([1, 2], dtype="uint8")))
+aot_executor.set_input("b", tvm.nd.array(np.ndarray([3, 5], dtype="uint8")))
+aot_exec.run()
+output = aot_exec.get_output(0)
+assert output.asnumpy() == np.ndarray([5, 7], dtype="uint8")
+```
+
+[1] NOTE: The target string is not the final place this customization should be made. However, it's
+been the place where we've been putting runtime-related stuff. A separate RFC will split the Target
+string into Target options (which affect tuning) and runtime options.
+
+# **Reference-level explanation**
+
+Already committed to TVM is the AotExecutorCodegen. This module produces a TIR top-level function
+which invokes the Relay operators (implemented in TIR) in a correct order. An example is given
+below:
+
+```bash
+PrimFunc([input1, input2, output]) attrs={"global_symbol": "tvmgen_my_mod_run_model", "runner_function": (bool)1} {
+  // attr [(nullptr)] device_id = 0
+  // attr [(nullptr)] device_type = 1
+  tir.tvm_call_packed("tvmgen_my_mod_fused_add", input1, input2, output)
+}
+```
+
+The AotExecutor then needs to accomplish the following to meet Module-based Model Runtime Interface:
+
+1. Allocate input and output tensors as defined in the `run_model` function using the correct Device

Review comment:
       you mean that AOTExecutorCodegen does not emit information to determine the device, correct? I think right now it does in the sense that it hardcodes the device to (kDLCPU, 0) wherever it uses that, right?
   
   here I'm not suggesting we modify the codegen to create tir.allocate nodes in the body of the emitted function. I'm only proposing that we emit metadata that contains shape, dtype, and device information for the expected inputs and outputs from that main func, and then defer allocation to either a runtime component (in the case of the C++ runtime) or a compile-time component (if used with the C runtime and microTVM).
   
   The runtime changes proposed here align with E1. We could also implement `set_input_zero_copy` to align with E2. But again these are runtime changes which require that additional output from AOTExecutorCodegen to work properly.

##########
File path: rfcs/0046-module-based-model-runtime-for-aot.md
##########
@@ -0,0 +1,348 @@
+# Module-based Model Runtime Interface for AOT
+
+- Feature Name: module_based_model_runtime_for_aot
+- Start Date: 2021-09-17
+- RFC PR: [apache/tvm-rfcs#0046](https://github.com/apache/tvm-rfcs/pull/0046)
+- GitHub Issue: [apache/tvm#0000](https://github.com/apache/tvm/issues/0000)
+
+# **Summary**
+
+This RFC describes a [Module-based Model Runtime
+interface](https://discuss.tvm.apache.org/t/discuss-module-based-model-runtime-interface/5025) for
+the [Ahead-of-Time Executor](https://discuss.tvm.apache.org/t/implementing-aot-in-tvm/9206), thereby
+enabling its use from the TVM C++ Runtime.
+
+# **Motivation**
+
+The microTVM project has made significant progress towards an Ahead-of-Time Executor for compiled
+Relay models. At the time of writing, it's now possible to codegen a TIR function which executes
+Relay models that have known shapes, don't have graph-level control flow, and execute only on the
+CPU device. Right now, the C runtime is the only such runtime environment which can interact with
+this generated code. However, significant interest exists in enabling the C++ runtime to use the
+Ahead-of-Time executor.
+
+# **Guide-level explanation**
+
+Users select the AOT executor at compile time through the traditional GraphExecutor compilation flow
+(e.g. `[tvm.relay.build](http://tvm.relay.build)`) by including `--executor=aot` in the Target
+[1]. The return value of `tvm.relay.build` in this case is an `AotExecutorFactory` Module
+object. Users instantiate the AOT executor via `AotExecutorFactory` as they do with `GraphExecutor`:
+
+```bash
+ir_mod = tvm.parser.fromtext("""\
+      #[version = "0.0.5"]
+      def @main(%a : Tensor[(1, 2), uint8], %b : Tensor[(1, 2), uint8]) {
+          %0 = %a + %b;
+          %0
+      }"""
+    )
+
+with PassConfig(opt_level=3):
+  factory : AotExecutorFactory = tvm.relay.build(
+       ir_mod, "llvm -executor=aot", module_name="my_mod")
+
+aot_executor : AotExecutor = factory["my_mod"](tvm.cpu(0))
+```
+
+`AotExecutor` supports the traditional Module-Based Model Runtime Interface and can be used as a
+user normally would `GraphExecutor`:
+
+```bash
+aot_executor.set_input("a", tvm.nd.array(np.ndarray([1, 2], dtype="uint8")))
+aot_executor.set_input("b", tvm.nd.array(np.ndarray([3, 5], dtype="uint8")))
+aot_exec.run()
+output = aot_exec.get_output(0)
+assert output.asnumpy() == np.ndarray([5, 7], dtype="uint8")
+```
+
+[1] NOTE: The target string is not the final place this customization should be made. However, it's
+been the place where we've been putting runtime-related stuff. A separate RFC will split the Target
+string into Target options (which affect tuning) and runtime options.
+
+# **Reference-level explanation**
+
+Already committed to TVM is the AotExecutorCodegen. This module produces a TIR top-level function
+which invokes the Relay operators (implemented in TIR) in a correct order. An example is given
+below:
+
+```bash
+PrimFunc([input1, input2, output]) attrs={"global_symbol": "tvmgen_my_mod_run_model", "runner_function": (bool)1} {
+  // attr [(nullptr)] device_id = 0
+  // attr [(nullptr)] device_type = 1
+  tir.tvm_call_packed("tvmgen_my_mod_fused_add", input1, input2, output)
+}
+```
+
+The AotExecutor then needs to accomplish the following to meet Module-based Model Runtime Interface:
+
+1. Allocate input and output tensors as defined in the `run_model` function using the correct Device
+   API.
+2. Provide a mapping from relay parameter name to positional argument.
+3. Invoke the generated TIR function and provide profiling.
+
+### Compiler ↔ Runtime Metadata
+
+In order to implement (1) and (2) above, additional metadata about the `run_model` function needs to
+be communicated from Compiler to Runtime:
+
+- The mapping between Relay parameter name and TIR argument position
+- The number of inputs and outputs
+- The type of each parameter
+- Information sufficient to choose a Device API to allocate memory for that data.
+
+At present, Metadata is passed from Compiler to Runtime in several different ways:
+
+1. Constant DLTensor can be bundled with code and supplied to `runtime::Module` via
+   `runtime::MetadataModule`
+2. Many non-DSO-exportable backends (`cuda`, `hexagon`, `metal`, `opencl`, `sdaccel`, `rocm`,
+   `vulkan`) have adopted the convention of including a
+   [1runtime::FunctionInfo`](https://github.com/apache/tvm/blob/main/src/runtime/meta_data.h#L106)
+   (NOTE: distinct from `tvm::relay::transform::FunctionInfo`) in their serialization:
+
+    ```bash
+    /*! \brief function information needed by device */
+    struct FunctionInfo {
+      std::string name;
+      std::vector<DLDataType> arg_types;
+      std::vector<std::string> launch_param_tags;
+    }
+    ```
+
+3. AotExecutorCodegen and GraphExecutorCodegen have adopted the practice of producing the
+   graph-level
+   [`runtime::MetadataNode`](https://github.com/apache/tvm/blob/main/src/runtime/meta_data.h#L55):
+
+    ```bash
+    /*!
+     * \brief Structure that can be optionally used by the executor codegen
+     */
+    class MetadataNode : public Object {
+     public:
+      /*! \brief input information for the main function */
+      Array<String> inputs;
+      /*! \brief number of outputs of the main function */
+      int num_outputs = 1;
+      /*! \brief the executor to be used to run the model */
+      String executor = kTvmExecutorGraph;
+
+      String mod_name = "";
+    }
+    ```
+
+4. The recent AOTExecutor implementation has created `tvm::relay::transform::FunctionInfo` which
+   communicates statistics about memory usage and I/O operation for each TIR operator and aggregate
+   statistics for the top-level AOT function:
+
+    ```bash
+    struct FunctionInfoNode : public Object {
+      Map<Target, Integer> workspace_sizes;
+      Map<Target, Integer> io_sizes;
+      Map<Target, Integer> constant_sizes;
+      Map<Target, tir::PrimFunc> tir_primfuncs;
+      Map<Target, Function> relay_primfuncs;
+    }
+    ```
+
+
+Some duplication of information is already present. Likely this is due in part to the existing
+middle-end compiler design, in which a separate `IRModule` is produced for each backend. Another
+factor may be: since `runtime::Module` are responsible for their own serialization, and passing
+`Node` across `PackedFunc` requires a cast, the lack of a centralized facility for
+`runtime::Modules` to obtain module-level Metadata has led backend authors to roll their own. This
+pattern means that it's very difficult to assess the full scope of metadata handed to the runtime,
+particularly across all backends.
+
+Work is currently ongoing to unify the pre-codegen `IRModule` into a single instance. After this
+work is completed, it will be much easier to produce a centralized module-level Metadata. This RFC
+argues for the expansion of `runtime::MetadataNode` in the following ways:

Review comment:
       how so?

##########
File path: rfcs/0046-module-based-model-runtime-for-aot.md
##########
@@ -0,0 +1,348 @@
+# Module-based Model Runtime Interface for AOT
+
+- Feature Name: module_based_model_runtime_for_aot
+- Start Date: 2021-09-17
+- RFC PR: [apache/tvm-rfcs#0046](https://github.com/apache/tvm-rfcs/pull/0046)
+- GitHub Issue: [apache/tvm#0000](https://github.com/apache/tvm/issues/0000)
+
+# **Summary**
+
+This RFC describes a [Module-based Model Runtime
+interface](https://discuss.tvm.apache.org/t/discuss-module-based-model-runtime-interface/5025) for
+the [Ahead-of-Time Executor](https://discuss.tvm.apache.org/t/implementing-aot-in-tvm/9206), thereby
+enabling its use from the TVM C++ Runtime.
+
+# **Motivation**
+
+The microTVM project has made significant progress towards an Ahead-of-Time Executor for compiled
+Relay models. At the time of writing, it's now possible to codegen a TIR function which executes
+Relay models that have known shapes, don't have graph-level control flow, and execute only on the
+CPU device. Right now, the C runtime is the only such runtime environment which can interact with
+this generated code. However, significant interest exists in enabling the C++ runtime to use the
+Ahead-of-Time executor.
+
+# **Guide-level explanation**
+
+Users select the AOT executor at compile time through the traditional GraphExecutor compilation flow
+(e.g. `[tvm.relay.build](http://tvm.relay.build)`) by including `--executor=aot` in the Target
+[1]. The return value of `tvm.relay.build` in this case is an `AotExecutorFactory` Module
+object. Users instantiate the AOT executor via `AotExecutorFactory` as they do with `GraphExecutor`:
+
+```bash
+ir_mod = tvm.parser.fromtext("""\
+      #[version = "0.0.5"]
+      def @main(%a : Tensor[(1, 2), uint8], %b : Tensor[(1, 2), uint8]) {
+          %0 = %a + %b;
+          %0
+      }"""
+    )
+
+with PassConfig(opt_level=3):
+  factory : AotExecutorFactory = tvm.relay.build(
+       ir_mod, "llvm -executor=aot", module_name="my_mod")
+
+aot_executor : AotExecutor = factory["my_mod"](tvm.cpu(0))
+```
+
+`AotExecutor` supports the traditional Module-Based Model Runtime Interface and can be used as a
+user normally would `GraphExecutor`:
+
+```bash
+aot_executor.set_input("a", tvm.nd.array(np.ndarray([1, 2], dtype="uint8")))
+aot_executor.set_input("b", tvm.nd.array(np.ndarray([3, 5], dtype="uint8")))
+aot_exec.run()
+output = aot_exec.get_output(0)
+assert output.asnumpy() == np.ndarray([5, 7], dtype="uint8")
+```
+
+[1] NOTE: The target string is not the final place this customization should be made. However, it's
+been the place where we've been putting runtime-related stuff. A separate RFC will split the Target
+string into Target options (which affect tuning) and runtime options.
+
+# **Reference-level explanation**
+
+Already committed to TVM is the AotExecutorCodegen. This module produces a TIR top-level function
+which invokes the Relay operators (implemented in TIR) in a correct order. An example is given
+below:
+
+```bash
+PrimFunc([input1, input2, output]) attrs={"global_symbol": "tvmgen_my_mod_run_model", "runner_function": (bool)1} {
+  // attr [(nullptr)] device_id = 0
+  // attr [(nullptr)] device_type = 1
+  tir.tvm_call_packed("tvmgen_my_mod_fused_add", input1, input2, output)
+}
+```
+
+The AotExecutor then needs to accomplish the following to meet Module-based Model Runtime Interface:
+
+1. Allocate input and output tensors as defined in the `run_model` function using the correct Device
+   API.
+2. Provide a mapping from relay parameter name to positional argument.
+3. Invoke the generated TIR function and provide profiling.
+
+### Compiler ↔ Runtime Metadata
+
+In order to implement (1) and (2) above, additional metadata about the `run_model` function needs to
+be communicated from Compiler to Runtime:
+
+- The mapping between Relay parameter name and TIR argument position
+- The number of inputs and outputs
+- The type of each parameter
+- Information sufficient to choose a Device API to allocate memory for that data.
+
+At present, Metadata is passed from Compiler to Runtime in several different ways:
+
+1. Constant DLTensor can be bundled with code and supplied to `runtime::Module` via
+   `runtime::MetadataModule`
+2. Many non-DSO-exportable backends (`cuda`, `hexagon`, `metal`, `opencl`, `sdaccel`, `rocm`,
+   `vulkan`) have adopted the convention of including a
+   [1runtime::FunctionInfo`](https://github.com/apache/tvm/blob/main/src/runtime/meta_data.h#L106)
+   (NOTE: distinct from `tvm::relay::transform::FunctionInfo`) in their serialization:
+
+    ```bash
+    /*! \brief function information needed by device */
+    struct FunctionInfo {
+      std::string name;
+      std::vector<DLDataType> arg_types;
+      std::vector<std::string> launch_param_tags;
+    }
+    ```
+
+3. AotExecutorCodegen and GraphExecutorCodegen have adopted the practice of producing the
+   graph-level
+   [`runtime::MetadataNode`](https://github.com/apache/tvm/blob/main/src/runtime/meta_data.h#L55):
+
+    ```bash
+    /*!
+     * \brief Structure that can be optionally used by the executor codegen
+     */
+    class MetadataNode : public Object {
+     public:
+      /*! \brief input information for the main function */
+      Array<String> inputs;
+      /*! \brief number of outputs of the main function */
+      int num_outputs = 1;
+      /*! \brief the executor to be used to run the model */
+      String executor = kTvmExecutorGraph;
+
+      String mod_name = "";
+    }
+    ```
+
+4. The recent AOTExecutor implementation has created `tvm::relay::transform::FunctionInfo` which
+   communicates statistics about memory usage and I/O operation for each TIR operator and aggregate
+   statistics for the top-level AOT function:
+
+    ```bash
+    struct FunctionInfoNode : public Object {
+      Map<Target, Integer> workspace_sizes;
+      Map<Target, Integer> io_sizes;
+      Map<Target, Integer> constant_sizes;
+      Map<Target, tir::PrimFunc> tir_primfuncs;
+      Map<Target, Function> relay_primfuncs;
+    }
+    ```
+
+
+Some duplication of information is already present. Likely this is due in part to the existing
+middle-end compiler design, in which a separate `IRModule` is produced for each backend. Another
+factor may be: since `runtime::Module` are responsible for their own serialization, and passing
+`Node` across `PackedFunc` requires a cast, the lack of a centralized facility for
+`runtime::Modules` to obtain module-level Metadata has led backend authors to roll their own. This
+pattern means that it's very difficult to assess the full scope of metadata handed to the runtime,
+particularly across all backends.
+
+Work is currently ongoing to unify the pre-codegen `IRModule` into a single instance. After this
+work is completed, it will be much easier to produce a centralized module-level Metadata. This RFC
+argues for the expansion of `runtime::MetadataNode` in the following ways:
+
+1. Rename `runtime::MetadataModule` to `runtime::ConstLoaderModule` to disambiguate the two and make
+   its purpose in life clearer.
+2. Expand `input_args` in the existing `runtime::Metadata` to parity with `runtime::FunctionInfo`,
+   plus include `_sizes` from `tvm::relay::transform::FunctionInfoNode` and the required `shape` and
+   `dtype` information from the beginning of this section.
+3. Introduce `ModelMetadataModule` to contain this information for use with the C++ runtime.
+
+    ```bash
+    class ModelMetadataModule {
+      virtual GetFunction(const std::string& name, ObjectPtr<Object>& sptr_to_self) {
+        if (name == "get_model_metadata") {
+           return PackedFunc([](TVMArgs args, TVMRetValue* rv) {
+              *rv = ModelMetadata(metadata_);
+           });
+        } else {
+          return PackedFunc();
+        }
+      }
+
+      const struct ModelMetadata* metadata_;
+    };
+    ```
+
+4. Introduce an optional implementation for the C runtime.
+5. Export runtime::Metadata to Model Library Format.
+
+The new proposed definition of `runtime::Metadata` is as follows.  NOTE that this is a C definition
+because it will be made available both the C and C++ runtimes. A C++ wrapper will be written.
+
+```bash
+struct ParameterInfo {
+  const char* relay_name_hint;
+  const char* tir_name_hint;
+  int64_t* shape;
+  int64_t ndim;
+  DLDataType dtype;
+  TargetDevice target_device;  // NOTE: future addition; not covered in this RFC.
+};
+
+struct FunctionInfo {
+  const char* function_name;
+  struct ParameterInfo* params;
+  int num_inputs;
+  int num_outputs;
+  int64_t workspace_size_bytes;
+  int64_t io_size_bytes;
+  int64_t constant_size_bytes;
+};
+
+typedef struct Metadata {
+  int version;
+  struct FunctionInfo* functions;
+  const char* module_name;
+};
+```
+
+### Internal workings of AotExecutor (`--runtime=c++ --interface-api=packed`)
+
+Given the above, we can now sketch out the way AotExecutor should behave (for C++ runtime).
+
+Module initialization will:
+
+1. Load the `ModelMetadata` using `get_model_metadata` PackedFunc.
+2. Allocate space for the parameters to `tvmgen_<model_name>_run_model`.
+3. Lookup and load any linked parameters using the `--link-params` mechanism.
+
+- `set_input`, `get_input`, `get_output` all work as they do in `GraphExecutor`.
+- `run` assembles `TVMArgs` containing inputs + outputs and invokes `tvmgen_<model_name>_run_model`.
+- `time_evaluator` is implemented in the same way as it is in `GraphExecutor`. Timing `run_model` is
+  done using the CPU timer.
+
+### Internal workings of AotExecutor (`--runtime=c --interface-api=packed`)
+
+The C runtime version works in a very similar way with C accessor functions for the `ModelMetadata`.
+
+### No AotExecutor implementation planned (`--runtime=c --interface-api=c`)
+
+When `-interface-api=c` is present in the Target string, the `run_model` function no longer accepts
+the PackedFunc interface and instead accepts `arg_values` directly as positional args:
+
+```bash
+TVM_DLL int32_t tvmgen_default_run_model(void* arg0, void* arg1, void* arg2) {
+  void* input = arg0;
+  void* input1 = arg1;
+  void* output = arg2;
+  (void)tvmgen_default_fused_multiply(input, input1, output);
+  return 0;
+}
+```
+
+Additional work is underway to wrap this in a firmware-friendly interface. A core design goal of
+this interface is to offload all memory management tasks to the calling code to facilitate
+integration with bare-metal embedded devices.
+
+Therefore, it would go against the goals of the C interface to introduce a generic runtime wrapper
+compatible with PackedFunc calling convention. It may be possible to do so in the future, but it

Review comment:
       agreed. because this RFC proposes to implement a wrapper around the code-generated PackedFunc interface, I think these concerns are in fact separated. lmk if you disagree.

##########
File path: rfcs/0046-module-based-model-runtime-for-aot.md
##########
@@ -0,0 +1,348 @@
+# Module-based Model Runtime Interface for AOT
+
+- Feature Name: module_based_model_runtime_for_aot
+- Start Date: 2021-09-17
+- RFC PR: [apache/tvm-rfcs#0046](https://github.com/apache/tvm-rfcs/pull/0046)
+- GitHub Issue: [apache/tvm#0000](https://github.com/apache/tvm/issues/0000)
+
+# **Summary**
+
+This RFC describes a [Module-based Model Runtime
+interface](https://discuss.tvm.apache.org/t/discuss-module-based-model-runtime-interface/5025) for
+the [Ahead-of-Time Executor](https://discuss.tvm.apache.org/t/implementing-aot-in-tvm/9206), thereby
+enabling its use from the TVM C++ Runtime.
+
+# **Motivation**
+
+The microTVM project has made significant progress towards an Ahead-of-Time Executor for compiled
+Relay models. At the time of writing, it's now possible to codegen a TIR function which executes
+Relay models that have known shapes, don't have graph-level control flow, and execute only on the
+CPU device. Right now, the C runtime is the only such runtime environment which can interact with
+this generated code. However, significant interest exists in enabling the C++ runtime to use the
+Ahead-of-Time executor.
+
+# **Guide-level explanation**
+
+Users select the AOT executor at compile time through the traditional GraphExecutor compilation flow
+(e.g. `[tvm.relay.build](http://tvm.relay.build)`) by including `--executor=aot` in the Target
+[1]. The return value of `tvm.relay.build` in this case is an `AotExecutorFactory` Module
+object. Users instantiate the AOT executor via `AotExecutorFactory` as they do with `GraphExecutor`:
+
+```bash
+ir_mod = tvm.parser.fromtext("""\
+      #[version = "0.0.5"]
+      def @main(%a : Tensor[(1, 2), uint8], %b : Tensor[(1, 2), uint8]) {
+          %0 = %a + %b;
+          %0
+      }"""
+    )
+
+with PassConfig(opt_level=3):
+  factory : AotExecutorFactory = tvm.relay.build(
+       ir_mod, "llvm -executor=aot", module_name="my_mod")
+
+aot_executor : AotExecutor = factory["my_mod"](tvm.cpu(0))
+```
+
+`AotExecutor` supports the traditional Module-Based Model Runtime Interface and can be used as a
+user normally would `GraphExecutor`:
+
+```bash
+aot_executor.set_input("a", tvm.nd.array(np.ndarray([1, 2], dtype="uint8")))
+aot_executor.set_input("b", tvm.nd.array(np.ndarray([3, 5], dtype="uint8")))
+aot_exec.run()
+output = aot_exec.get_output(0)
+assert output.asnumpy() == np.ndarray([5, 7], dtype="uint8")
+```
+
+[1] NOTE: The target string is not the final place this customization should be made. However, it's
+been the place where we've been putting runtime-related stuff. A separate RFC will split the Target
+string into Target options (which affect tuning) and runtime options.
+
+# **Reference-level explanation**
+
+Already committed to TVM is the AotExecutorCodegen. This module produces a TIR top-level function
+which invokes the Relay operators (implemented in TIR) in a correct order. An example is given
+below:
+
+```bash
+PrimFunc([input1, input2, output]) attrs={"global_symbol": "tvmgen_my_mod_run_model", "runner_function": (bool)1} {
+  // attr [(nullptr)] device_id = 0
+  // attr [(nullptr)] device_type = 1
+  tir.tvm_call_packed("tvmgen_my_mod_fused_add", input1, input2, output)
+}
+```
+
+The AotExecutor then needs to accomplish the following to meet Module-based Model Runtime Interface:
+
+1. Allocate input and output tensors as defined in the `run_model` function using the correct Device
+   API.
+2. Provide a mapping from relay parameter name to positional argument.
+3. Invoke the generated TIR function and provide profiling.
+
+### Compiler ↔ Runtime Metadata
+
+In order to implement (1) and (2) above, additional metadata about the `run_model` function needs to
+be communicated from Compiler to Runtime:
+
+- The mapping between Relay parameter name and TIR argument position
+- The number of inputs and outputs
+- The type of each parameter
+- Information sufficient to choose a Device API to allocate memory for that data.
+
+At present, Metadata is passed from Compiler to Runtime in several different ways:
+
+1. Constant DLTensor can be bundled with code and supplied to `runtime::Module` via
+   `runtime::MetadataModule`
+2. Many non-DSO-exportable backends (`cuda`, `hexagon`, `metal`, `opencl`, `sdaccel`, `rocm`,
+   `vulkan`) have adopted the convention of including a
+   [1runtime::FunctionInfo`](https://github.com/apache/tvm/blob/main/src/runtime/meta_data.h#L106)
+   (NOTE: distinct from `tvm::relay::transform::FunctionInfo`) in their serialization:
+
+    ```bash
+    /*! \brief function information needed by device */
+    struct FunctionInfo {
+      std::string name;
+      std::vector<DLDataType> arg_types;
+      std::vector<std::string> launch_param_tags;
+    }
+    ```
+
+3. AotExecutorCodegen and GraphExecutorCodegen have adopted the practice of producing the
+   graph-level
+   [`runtime::MetadataNode`](https://github.com/apache/tvm/blob/main/src/runtime/meta_data.h#L55):

Review comment:
       ah updated the RFC text to point to that. thanks!

##########
File path: rfcs/0046-module-based-model-runtime-for-aot.md
##########
@@ -0,0 +1,348 @@
+# Module-based Model Runtime Interface for AOT
+
+- Feature Name: module_based_model_runtime_for_aot
+- Start Date: 2021-09-17
+- RFC PR: [apache/tvm-rfcs#0046](https://github.com/apache/tvm-rfcs/pull/0046)
+- GitHub Issue: [apache/tvm#0000](https://github.com/apache/tvm/issues/0000)
+
+# **Summary**
+
+This RFC describes a [Module-based Model Runtime
+interface](https://discuss.tvm.apache.org/t/discuss-module-based-model-runtime-interface/5025) for
+the [Ahead-of-Time Executor](https://discuss.tvm.apache.org/t/implementing-aot-in-tvm/9206), thereby
+enabling its use from the TVM C++ Runtime.
+
+# **Motivation**
+
+The microTVM project has made significant progress towards an Ahead-of-Time Executor for compiled
+Relay models. At the time of writing, it's now possible to codegen a TIR function which executes
+Relay models that have known shapes, don't have graph-level control flow, and execute only on the
+CPU device. Right now, the C runtime is the only such runtime environment which can interact with
+this generated code. However, significant interest exists in enabling the C++ runtime to use the
+Ahead-of-Time executor.
+
+# **Guide-level explanation**
+
+Users select the AOT executor at compile time through the traditional GraphExecutor compilation flow
+(e.g. `[tvm.relay.build](http://tvm.relay.build)`) by including `--executor=aot` in the Target
+[1]. The return value of `tvm.relay.build` in this case is an `AotExecutorFactory` Module
+object. Users instantiate the AOT executor via `AotExecutorFactory` as they do with `GraphExecutor`:
+
+```bash
+ir_mod = tvm.parser.fromtext("""\
+      #[version = "0.0.5"]
+      def @main(%a : Tensor[(1, 2), uint8], %b : Tensor[(1, 2), uint8]) {
+          %0 = %a + %b;
+          %0
+      }"""
+    )
+
+with PassConfig(opt_level=3):
+  factory : AotExecutorFactory = tvm.relay.build(
+       ir_mod, "llvm -executor=aot", module_name="my_mod")
+
+aot_executor : AotExecutor = factory["my_mod"](tvm.cpu(0))
+```
+
+`AotExecutor` supports the traditional Module-Based Model Runtime Interface and can be used as a
+user normally would `GraphExecutor`:
+
+```bash
+aot_executor.set_input("a", tvm.nd.array(np.ndarray([1, 2], dtype="uint8")))
+aot_executor.set_input("b", tvm.nd.array(np.ndarray([3, 5], dtype="uint8")))
+aot_exec.run()
+output = aot_exec.get_output(0)
+assert output.asnumpy() == np.ndarray([5, 7], dtype="uint8")
+```
+
+[1] NOTE: The target string is not the final place this customization should be made. However, it's
+been the place where we've been putting runtime-related stuff. A separate RFC will split the Target
+string into Target options (which affect tuning) and runtime options.
+
+# **Reference-level explanation**
+
+Already committed to TVM is the AotExecutorCodegen. This module produces a TIR top-level function
+which invokes the Relay operators (implemented in TIR) in a correct order. An example is given
+below:
+
+```bash
+PrimFunc([input1, input2, output]) attrs={"global_symbol": "tvmgen_my_mod_run_model", "runner_function": (bool)1} {
+  // attr [(nullptr)] device_id = 0
+  // attr [(nullptr)] device_type = 1
+  tir.tvm_call_packed("tvmgen_my_mod_fused_add", input1, input2, output)
+}
+```
+
+The AotExecutor then needs to accomplish the following to meet Module-based Model Runtime Interface:
+
+1. Allocate input and output tensors as defined in the `run_model` function using the correct Device
+   API.
+2. Provide a mapping from relay parameter name to positional argument.
+3. Invoke the generated TIR function and provide profiling.
+
+### Compiler ↔ Runtime Metadata
+
+In order to implement (1) and (2) above, additional metadata about the `run_model` function needs to
+be communicated from Compiler to Runtime:
+
+- The mapping between Relay parameter name and TIR argument position
+- The number of inputs and outputs
+- The type of each parameter
+- Information sufficient to choose a Device API to allocate memory for that data.
+
+At present, Metadata is passed from Compiler to Runtime in several different ways:
+
+1. Constant DLTensor can be bundled with code and supplied to `runtime::Module` via
+   `runtime::MetadataModule`
+2. Many non-DSO-exportable backends (`cuda`, `hexagon`, `metal`, `opencl`, `sdaccel`, `rocm`,
+   `vulkan`) have adopted the convention of including a
+   [1runtime::FunctionInfo`](https://github.com/apache/tvm/blob/main/src/runtime/meta_data.h#L106)
+   (NOTE: distinct from `tvm::relay::transform::FunctionInfo`) in their serialization:
+
+    ```bash
+    /*! \brief function information needed by device */
+    struct FunctionInfo {
+      std::string name;
+      std::vector<DLDataType> arg_types;
+      std::vector<std::string> launch_param_tags;
+    }
+    ```
+
+3. AotExecutorCodegen and GraphExecutorCodegen have adopted the practice of producing the
+   graph-level
+   [`runtime::MetadataNode`](https://github.com/apache/tvm/blob/main/src/runtime/meta_data.h#L55):
+
+    ```bash
+    /*!
+     * \brief Structure that can be optionally used by the executor codegen
+     */
+    class MetadataNode : public Object {
+     public:
+      /*! \brief input information for the main function */
+      Array<String> inputs;
+      /*! \brief number of outputs of the main function */
+      int num_outputs = 1;
+      /*! \brief the executor to be used to run the model */
+      String executor = kTvmExecutorGraph;
+
+      String mod_name = "";
+    }
+    ```
+
+4. The recent AOTExecutor implementation has created `tvm::relay::transform::FunctionInfo` which
+   communicates statistics about memory usage and I/O operation for each TIR operator and aggregate
+   statistics for the top-level AOT function:
+
+    ```bash
+    struct FunctionInfoNode : public Object {
+      Map<Target, Integer> workspace_sizes;
+      Map<Target, Integer> io_sizes;
+      Map<Target, Integer> constant_sizes;
+      Map<Target, tir::PrimFunc> tir_primfuncs;
+      Map<Target, Function> relay_primfuncs;
+    }
+    ```
+
+
+Some duplication of information is already present. Likely this is due in part to the existing
+middle-end compiler design, in which a separate `IRModule` is produced for each backend. Another

Review comment:
       a couple points:
   - more than one `runtime.Module` can be produced by a given backend
   - depending on the load path in C++ runtime, the tree can become rearranged. see https://discuss.tvm.apache.org/t/introduce-artifact-a-container-for-generated-code/11099

##########
File path: rfcs/0046-module-based-model-runtime-for-aot.md
##########
@@ -0,0 +1,348 @@
+# Module-based Model Runtime Interface for AOT
+
+- Feature Name: module_based_model_runtime_for_aot
+- Start Date: 2021-09-17
+- RFC PR: [apache/tvm-rfcs#0046](https://github.com/apache/tvm-rfcs/pull/0046)
+- GitHub Issue: [apache/tvm#0000](https://github.com/apache/tvm/issues/0000)
+
+# **Summary**
+
+This RFC describes a [Module-based Model Runtime
+interface](https://discuss.tvm.apache.org/t/discuss-module-based-model-runtime-interface/5025) for
+the [Ahead-of-Time Executor](https://discuss.tvm.apache.org/t/implementing-aot-in-tvm/9206), thereby
+enabling its use from the TVM C++ Runtime.
+
+# **Motivation**
+
+The microTVM project has made significant progress towards an Ahead-of-Time Executor for compiled
+Relay models. At the time of writing, it's now possible to codegen a TIR function which executes
+Relay models that have known shapes, don't have graph-level control flow, and execute only on the
+CPU device. Right now, the C runtime is the only such runtime environment which can interact with
+this generated code. However, significant interest exists in enabling the C++ runtime to use the
+Ahead-of-Time executor.
+
+# **Guide-level explanation**
+
+Users select the AOT executor at compile time through the traditional GraphExecutor compilation flow
+(e.g. `[tvm.relay.build](http://tvm.relay.build)`) by including `--executor=aot` in the Target
+[1]. The return value of `tvm.relay.build` in this case is an `AotExecutorFactory` Module
+object. Users instantiate the AOT executor via `AotExecutorFactory` as they do with `GraphExecutor`:
+
+```bash
+ir_mod = tvm.parser.fromtext("""\
+      #[version = "0.0.5"]
+      def @main(%a : Tensor[(1, 2), uint8], %b : Tensor[(1, 2), uint8]) {
+          %0 = %a + %b;
+          %0
+      }"""
+    )
+
+with PassConfig(opt_level=3):
+  factory : AotExecutorFactory = tvm.relay.build(
+       ir_mod, "llvm -executor=aot", module_name="my_mod")
+
+aot_executor : AotExecutor = factory["my_mod"](tvm.cpu(0))
+```
+
+`AotExecutor` supports the traditional Module-Based Model Runtime Interface and can be used as a
+user normally would `GraphExecutor`:
+
+```bash
+aot_executor.set_input("a", tvm.nd.array(np.ndarray([1, 2], dtype="uint8")))
+aot_executor.set_input("b", tvm.nd.array(np.ndarray([3, 5], dtype="uint8")))
+aot_exec.run()
+output = aot_exec.get_output(0)
+assert output.asnumpy() == np.ndarray([5, 7], dtype="uint8")
+```
+
+[1] NOTE: The target string is not the final place this customization should be made. However, it's
+been the place where we've been putting runtime-related stuff. A separate RFC will split the Target
+string into Target options (which affect tuning) and runtime options.
+
+# **Reference-level explanation**
+
+Already committed to TVM is the AotExecutorCodegen. This module produces a TIR top-level function
+which invokes the Relay operators (implemented in TIR) in a correct order. An example is given
+below:
+
+```bash
+PrimFunc([input1, input2, output]) attrs={"global_symbol": "tvmgen_my_mod_run_model", "runner_function": (bool)1} {
+  // attr [(nullptr)] device_id = 0
+  // attr [(nullptr)] device_type = 1
+  tir.tvm_call_packed("tvmgen_my_mod_fused_add", input1, input2, output)
+}
+```
+
+The AotExecutor then needs to accomplish the following to meet Module-based Model Runtime Interface:
+
+1. Allocate input and output tensors as defined in the `run_model` function using the correct Device
+   API.
+2. Provide a mapping from relay parameter name to positional argument.
+3. Invoke the generated TIR function and provide profiling.
+
+### Compiler ↔ Runtime Metadata
+
+In order to implement (1) and (2) above, additional metadata about the `run_model` function needs to
+be communicated from Compiler to Runtime:
+
+- The mapping between Relay parameter name and TIR argument position
+- The number of inputs and outputs
+- The type of each parameter
+- Information sufficient to choose a Device API to allocate memory for that data.
+
+At present, Metadata is passed from Compiler to Runtime in several different ways:
+
+1. Constant DLTensor can be bundled with code and supplied to `runtime::Module` via
+   `runtime::MetadataModule`
+2. Many non-DSO-exportable backends (`cuda`, `hexagon`, `metal`, `opencl`, `sdaccel`, `rocm`,
+   `vulkan`) have adopted the convention of including a
+   [1runtime::FunctionInfo`](https://github.com/apache/tvm/blob/main/src/runtime/meta_data.h#L106)
+   (NOTE: distinct from `tvm::relay::transform::FunctionInfo`) in their serialization:
+
+    ```bash
+    /*! \brief function information needed by device */
+    struct FunctionInfo {
+      std::string name;
+      std::vector<DLDataType> arg_types;
+      std::vector<std::string> launch_param_tags;
+    }
+    ```
+
+3. AotExecutorCodegen and GraphExecutorCodegen have adopted the practice of producing the
+   graph-level
+   [`runtime::MetadataNode`](https://github.com/apache/tvm/blob/main/src/runtime/meta_data.h#L55):
+
+    ```bash
+    /*!
+     * \brief Structure that can be optionally used by the executor codegen
+     */
+    class MetadataNode : public Object {
+     public:
+      /*! \brief input information for the main function */
+      Array<String> inputs;
+      /*! \brief number of outputs of the main function */
+      int num_outputs = 1;
+      /*! \brief the executor to be used to run the model */
+      String executor = kTvmExecutorGraph;
+
+      String mod_name = "";
+    }
+    ```
+
+4. The recent AOTExecutor implementation has created `tvm::relay::transform::FunctionInfo` which
+   communicates statistics about memory usage and I/O operation for each TIR operator and aggregate
+   statistics for the top-level AOT function:
+
+    ```bash
+    struct FunctionInfoNode : public Object {
+      Map<Target, Integer> workspace_sizes;
+      Map<Target, Integer> io_sizes;
+      Map<Target, Integer> constant_sizes;
+      Map<Target, tir::PrimFunc> tir_primfuncs;
+      Map<Target, Function> relay_primfuncs;
+    }
+    ```
+
+
+Some duplication of information is already present. Likely this is due in part to the existing
+middle-end compiler design, in which a separate `IRModule` is produced for each backend. Another
+factor may be: since `runtime::Module` are responsible for their own serialization, and passing
+`Node` across `PackedFunc` requires a cast, the lack of a centralized facility for

Review comment:
       `tvm::Node`; clarified

##########
File path: rfcs/0046-module-based-model-runtime-for-aot.md
##########
@@ -0,0 +1,348 @@
+# Module-based Model Runtime Interface for AOT
+
+- Feature Name: module_based_model_runtime_for_aot
+- Start Date: 2021-09-17
+- RFC PR: [apache/tvm-rfcs#0046](https://github.com/apache/tvm-rfcs/pull/0046)
+- GitHub Issue: [apache/tvm#0000](https://github.com/apache/tvm/issues/0000)
+
+# **Summary**
+
+This RFC describes a [Module-based Model Runtime
+interface](https://discuss.tvm.apache.org/t/discuss-module-based-model-runtime-interface/5025) for
+the [Ahead-of-Time Executor](https://discuss.tvm.apache.org/t/implementing-aot-in-tvm/9206), thereby
+enabling its use from the TVM C++ Runtime.
+
+# **Motivation**
+
+The microTVM project has made significant progress towards an Ahead-of-Time Executor for compiled
+Relay models. At the time of writing, it's now possible to codegen a TIR function which executes
+Relay models that have known shapes, don't have graph-level control flow, and execute only on the
+CPU device. Right now, the C runtime is the only such runtime environment which can interact with
+this generated code. However, significant interest exists in enabling the C++ runtime to use the
+Ahead-of-Time executor.
+
+# **Guide-level explanation**
+
+Users select the AOT executor at compile time through the traditional GraphExecutor compilation flow
+(e.g. `[tvm.relay.build](http://tvm.relay.build)`) by including `--executor=aot` in the Target
+[1]. The return value of `tvm.relay.build` in this case is an `AotExecutorFactory` Module
+object. Users instantiate the AOT executor via `AotExecutorFactory` as they do with `GraphExecutor`:
+
+```bash
+ir_mod = tvm.parser.fromtext("""\
+      #[version = "0.0.5"]
+      def @main(%a : Tensor[(1, 2), uint8], %b : Tensor[(1, 2), uint8]) {
+          %0 = %a + %b;
+          %0
+      }"""
+    )
+
+with PassConfig(opt_level=3):
+  factory : AotExecutorFactory = tvm.relay.build(
+       ir_mod, "llvm -executor=aot", module_name="my_mod")
+
+aot_executor : AotExecutor = factory["my_mod"](tvm.cpu(0))
+```
+
+`AotExecutor` supports the traditional Module-Based Model Runtime Interface and can be used as a
+user normally would `GraphExecutor`:
+
+```bash
+aot_executor.set_input("a", tvm.nd.array(np.ndarray([1, 2], dtype="uint8")))
+aot_executor.set_input("b", tvm.nd.array(np.ndarray([3, 5], dtype="uint8")))
+aot_exec.run()
+output = aot_exec.get_output(0)
+assert output.asnumpy() == np.ndarray([5, 7], dtype="uint8")
+```
+
+[1] NOTE: The target string is not the final place this customization should be made. However, it's
+been the place where we've been putting runtime-related stuff. A separate RFC will split the Target
+string into Target options (which affect tuning) and runtime options.
+
+# **Reference-level explanation**
+
+Already committed to TVM is the AotExecutorCodegen. This module produces a TIR top-level function
+which invokes the Relay operators (implemented in TIR) in a correct order. An example is given
+below:
+
+```bash
+PrimFunc([input1, input2, output]) attrs={"global_symbol": "tvmgen_my_mod_run_model", "runner_function": (bool)1} {
+  // attr [(nullptr)] device_id = 0
+  // attr [(nullptr)] device_type = 1
+  tir.tvm_call_packed("tvmgen_my_mod_fused_add", input1, input2, output)
+}
+```
+
+The AotExecutor then needs to accomplish the following to meet Module-based Model Runtime Interface:
+
+1. Allocate input and output tensors as defined in the `run_model` function using the correct Device
+   API.
+2. Provide a mapping from relay parameter name to positional argument.
+3. Invoke the generated TIR function and provide profiling.
+
+### Compiler ↔ Runtime Metadata
+
+In order to implement (1) and (2) above, additional metadata about the `run_model` function needs to
+be communicated from Compiler to Runtime:
+
+- The mapping between Relay parameter name and TIR argument position
+- The number of inputs and outputs
+- The type of each parameter
+- Information sufficient to choose a Device API to allocate memory for that data.
+
+At present, Metadata is passed from Compiler to Runtime in several different ways:
+
+1. Constant DLTensor can be bundled with code and supplied to `runtime::Module` via
+   `runtime::MetadataModule`
+2. Many non-DSO-exportable backends (`cuda`, `hexagon`, `metal`, `opencl`, `sdaccel`, `rocm`,
+   `vulkan`) have adopted the convention of including a
+   [1runtime::FunctionInfo`](https://github.com/apache/tvm/blob/main/src/runtime/meta_data.h#L106)
+   (NOTE: distinct from `tvm::relay::transform::FunctionInfo`) in their serialization:
+
+    ```bash
+    /*! \brief function information needed by device */
+    struct FunctionInfo {
+      std::string name;
+      std::vector<DLDataType> arg_types;
+      std::vector<std::string> launch_param_tags;
+    }
+    ```
+
+3. AotExecutorCodegen and GraphExecutorCodegen have adopted the practice of producing the
+   graph-level
+   [`runtime::MetadataNode`](https://github.com/apache/tvm/blob/main/src/runtime/meta_data.h#L55):
+
+    ```bash
+    /*!
+     * \brief Structure that can be optionally used by the executor codegen
+     */
+    class MetadataNode : public Object {
+     public:
+      /*! \brief input information for the main function */
+      Array<String> inputs;
+      /*! \brief number of outputs of the main function */
+      int num_outputs = 1;
+      /*! \brief the executor to be used to run the model */
+      String executor = kTvmExecutorGraph;
+
+      String mod_name = "";
+    }
+    ```
+
+4. The recent AOTExecutor implementation has created `tvm::relay::transform::FunctionInfo` which
+   communicates statistics about memory usage and I/O operation for each TIR operator and aggregate
+   statistics for the top-level AOT function:
+
+    ```bash
+    struct FunctionInfoNode : public Object {
+      Map<Target, Integer> workspace_sizes;
+      Map<Target, Integer> io_sizes;
+      Map<Target, Integer> constant_sizes;
+      Map<Target, tir::PrimFunc> tir_primfuncs;
+      Map<Target, Function> relay_primfuncs;
+    }
+    ```
+
+
+Some duplication of information is already present. Likely this is due in part to the existing
+middle-end compiler design, in which a separate `IRModule` is produced for each backend. Another
+factor may be: since `runtime::Module` are responsible for their own serialization, and passing
+`Node` across `PackedFunc` requires a cast, the lack of a centralized facility for
+`runtime::Modules` to obtain module-level Metadata has led backend authors to roll their own. This
+pattern means that it's very difficult to assess the full scope of metadata handed to the runtime,
+particularly across all backends.
+
+Work is currently ongoing to unify the pre-codegen `IRModule` into a single instance. After this
+work is completed, it will be much easier to produce a centralized module-level Metadata. This RFC
+argues for the expansion of `runtime::MetadataNode` in the following ways:
+
+1. Rename `runtime::MetadataModule` to `runtime::ConstLoaderModule` to disambiguate the two and make
+   its purpose in life clearer.
+2. Expand `input_args` in the existing `runtime::Metadata` to parity with `runtime::FunctionInfo`,

Review comment:
       ah yeah. i think perhaps this is done now, then, since ExecutorCodegenMetadata exports tir::Var. However, to then fit this proposal we need to reduce it to something exportable. updated the text to clarify, see what you think.

##########
File path: rfcs/0046-module-based-model-runtime-for-aot.md
##########
@@ -0,0 +1,348 @@
+# Module-based Model Runtime Interface for AOT
+
+- Feature Name: module_based_model_runtime_for_aot
+- Start Date: 2021-09-17
+- RFC PR: [apache/tvm-rfcs#0046](https://github.com/apache/tvm-rfcs/pull/0046)
+- GitHub Issue: [apache/tvm#0000](https://github.com/apache/tvm/issues/0000)
+
+# **Summary**
+
+This RFC describes a [Module-based Model Runtime
+interface](https://discuss.tvm.apache.org/t/discuss-module-based-model-runtime-interface/5025) for
+the [Ahead-of-Time Executor](https://discuss.tvm.apache.org/t/implementing-aot-in-tvm/9206), thereby
+enabling its use from the TVM C++ Runtime.
+
+# **Motivation**
+
+The microTVM project has made significant progress towards an Ahead-of-Time Executor for compiled
+Relay models. At the time of writing, it's now possible to codegen a TIR function which executes
+Relay models that have known shapes, don't have graph-level control flow, and execute only on the
+CPU device. Right now, the C runtime is the only such runtime environment which can interact with
+this generated code. However, significant interest exists in enabling the C++ runtime to use the
+Ahead-of-Time executor.
+
+# **Guide-level explanation**
+
+Users select the AOT executor at compile time through the traditional GraphExecutor compilation flow
+(e.g. `[tvm.relay.build](http://tvm.relay.build)`) by including `--executor=aot` in the Target
+[1]. The return value of `tvm.relay.build` in this case is an `AotExecutorFactory` Module
+object. Users instantiate the AOT executor via `AotExecutorFactory` as they do with `GraphExecutor`:
+
+```bash
+ir_mod = tvm.parser.fromtext("""\
+      #[version = "0.0.5"]
+      def @main(%a : Tensor[(1, 2), uint8], %b : Tensor[(1, 2), uint8]) {
+          %0 = %a + %b;
+          %0
+      }"""
+    )
+
+with PassConfig(opt_level=3):
+  factory : AotExecutorFactory = tvm.relay.build(
+       ir_mod, "llvm -executor=aot", module_name="my_mod")
+
+aot_executor : AotExecutor = factory["my_mod"](tvm.cpu(0))
+```
+
+`AotExecutor` supports the traditional Module-Based Model Runtime Interface and can be used as a
+user normally would `GraphExecutor`:
+
+```bash
+aot_executor.set_input("a", tvm.nd.array(np.ndarray([1, 2], dtype="uint8")))
+aot_executor.set_input("b", tvm.nd.array(np.ndarray([3, 5], dtype="uint8")))
+aot_exec.run()
+output = aot_exec.get_output(0)
+assert output.asnumpy() == np.ndarray([5, 7], dtype="uint8")
+```
+
+[1] NOTE: The target string is not the final place this customization should be made. However, it's
+been the place where we've been putting runtime-related stuff. A separate RFC will split the Target
+string into Target options (which affect tuning) and runtime options.
+
+# **Reference-level explanation**
+
+Already committed to TVM is the AotExecutorCodegen. This module produces a TIR top-level function
+which invokes the Relay operators (implemented in TIR) in a correct order. An example is given
+below:
+
+```bash
+PrimFunc([input1, input2, output]) attrs={"global_symbol": "tvmgen_my_mod_run_model", "runner_function": (bool)1} {
+  // attr [(nullptr)] device_id = 0
+  // attr [(nullptr)] device_type = 1
+  tir.tvm_call_packed("tvmgen_my_mod_fused_add", input1, input2, output)
+}
+```
+
+The AotExecutor then needs to accomplish the following to meet Module-based Model Runtime Interface:
+
+1. Allocate input and output tensors as defined in the `run_model` function using the correct Device
+   API.
+2. Provide a mapping from relay parameter name to positional argument.
+3. Invoke the generated TIR function and provide profiling.
+
+### Compiler ↔ Runtime Metadata
+
+In order to implement (1) and (2) above, additional metadata about the `run_model` function needs to
+be communicated from Compiler to Runtime:
+
+- The mapping between Relay parameter name and TIR argument position
+- The number of inputs and outputs
+- The type of each parameter
+- Information sufficient to choose a Device API to allocate memory for that data.
+
+At present, Metadata is passed from Compiler to Runtime in several different ways:
+
+1. Constant DLTensor can be bundled with code and supplied to `runtime::Module` via
+   `runtime::MetadataModule`
+2. Many non-DSO-exportable backends (`cuda`, `hexagon`, `metal`, `opencl`, `sdaccel`, `rocm`,
+   `vulkan`) have adopted the convention of including a
+   [1runtime::FunctionInfo`](https://github.com/apache/tvm/blob/main/src/runtime/meta_data.h#L106)
+   (NOTE: distinct from `tvm::relay::transform::FunctionInfo`) in their serialization:
+
+    ```bash
+    /*! \brief function information needed by device */
+    struct FunctionInfo {
+      std::string name;
+      std::vector<DLDataType> arg_types;
+      std::vector<std::string> launch_param_tags;
+    }
+    ```
+
+3. AotExecutorCodegen and GraphExecutorCodegen have adopted the practice of producing the
+   graph-level
+   [`runtime::MetadataNode`](https://github.com/apache/tvm/blob/main/src/runtime/meta_data.h#L55):
+
+    ```bash
+    /*!
+     * \brief Structure that can be optionally used by the executor codegen
+     */
+    class MetadataNode : public Object {
+     public:
+      /*! \brief input information for the main function */
+      Array<String> inputs;
+      /*! \brief number of outputs of the main function */
+      int num_outputs = 1;
+      /*! \brief the executor to be used to run the model */
+      String executor = kTvmExecutorGraph;
+
+      String mod_name = "";
+    }
+    ```
+
+4. The recent AOTExecutor implementation has created `tvm::relay::transform::FunctionInfo` which
+   communicates statistics about memory usage and I/O operation for each TIR operator and aggregate
+   statistics for the top-level AOT function:
+
+    ```bash
+    struct FunctionInfoNode : public Object {
+      Map<Target, Integer> workspace_sizes;
+      Map<Target, Integer> io_sizes;
+      Map<Target, Integer> constant_sizes;
+      Map<Target, tir::PrimFunc> tir_primfuncs;
+      Map<Target, Function> relay_primfuncs;
+    }
+    ```
+
+
+Some duplication of information is already present. Likely this is due in part to the existing
+middle-end compiler design, in which a separate `IRModule` is produced for each backend. Another
+factor may be: since `runtime::Module` are responsible for their own serialization, and passing
+`Node` across `PackedFunc` requires a cast, the lack of a centralized facility for
+`runtime::Modules` to obtain module-level Metadata has led backend authors to roll their own. This
+pattern means that it's very difficult to assess the full scope of metadata handed to the runtime,
+particularly across all backends.
+
+Work is currently ongoing to unify the pre-codegen `IRModule` into a single instance. After this
+work is completed, it will be much easier to produce a centralized module-level Metadata. This RFC
+argues for the expansion of `runtime::MetadataNode` in the following ways:
+
+1. Rename `runtime::MetadataModule` to `runtime::ConstLoaderModule` to disambiguate the two and make
+   its purpose in life clearer.
+2. Expand `input_args` in the existing `runtime::Metadata` to parity with `runtime::FunctionInfo`,
+   plus include `_sizes` from `tvm::relay::transform::FunctionInfoNode` and the required `shape` and
+   `dtype` information from the beginning of this section.
+3. Introduce `ModelMetadataModule` to contain this information for use with the C++ runtime.
+
+    ```bash
+    class ModelMetadataModule {
+      virtual GetFunction(const std::string& name, ObjectPtr<Object>& sptr_to_self) {
+        if (name == "get_model_metadata") {
+           return PackedFunc([](TVMArgs args, TVMRetValue* rv) {
+              *rv = ModelMetadata(metadata_);
+           });
+        } else {
+          return PackedFunc();
+        }
+      }
+
+      const struct ModelMetadata* metadata_;
+    };
+    ```
+
+4. Introduce an optional implementation for the C runtime.
+5. Export runtime::Metadata to Model Library Format.
+
+The new proposed definition of `runtime::Metadata` is as follows.  NOTE that this is a C definition
+because it will be made available both the C and C++ runtimes. A C++ wrapper will be written.
+
+```bash
+struct ParameterInfo {
+  const char* relay_name_hint;
+  const char* tir_name_hint;
+  int64_t* shape;
+  int64_t ndim;
+  DLDataType dtype;
+  TargetDevice target_device;  // NOTE: future addition; not covered in this RFC.
+};
+
+struct FunctionInfo {
+  const char* function_name;
+  struct ParameterInfo* params;
+  int num_inputs;
+  int num_outputs;
+  int64_t workspace_size_bytes;
+  int64_t io_size_bytes;
+  int64_t constant_size_bytes;
+};
+
+typedef struct Metadata {
+  int version;
+  struct FunctionInfo* functions;
+  const char* module_name;
+};
+```
+
+### Internal workings of AotExecutor (`--runtime=c++ --interface-api=packed`)
+
+Given the above, we can now sketch out the way AotExecutor should behave (for C++ runtime).
+
+Module initialization will:
+
+1. Load the `ModelMetadata` using `get_model_metadata` PackedFunc.
+2. Allocate space for the parameters to `tvmgen_<model_name>_run_model`.
+3. Lookup and load any linked parameters using the `--link-params` mechanism.
+
+- `set_input`, `get_input`, `get_output` all work as they do in `GraphExecutor`.
+- `run` assembles `TVMArgs` containing inputs + outputs and invokes `tvmgen_<model_name>_run_model`.
+- `time_evaluator` is implemented in the same way as it is in `GraphExecutor`. Timing `run_model` is
+  done using the CPU timer.
+
+### Internal workings of AotExecutor (`--runtime=c --interface-api=packed`)
+
+The C runtime version works in a very similar way with C accessor functions for the `ModelMetadata`.

Review comment:
       i think if we are not changing the main function emitted by AotExecutorCodegen, this concern is resolved--but, please let me know if I understood it correctly. The main goal of this section is to sketch out how a possible implementation could work under these conditions, to ensure the metadata refactor is broadly applicable.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org