You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@tvm.apache.org by GitBox <gi...@apache.org> on 2021/04/01 16:32:49 UTC

[GitHub] [tvm] giuseros opened a new pull request #7785: [AOT] Introducing AOT in TVM

giuseros opened a new pull request #7785:
URL: https://github.com/apache/tvm/pull/7785


   This change adds the code generation and minimal runtime API to use the
   Ahead Of Time (AOT) compilation flow. The main logic is contained in:
   
   - src/relay/backend/aot_codegen.cc
   
   Which produces a TIR PrimFunc traversing the Relay graph
   
   The runtime interface (authored by @mousius) leaves a gap for future
   iterations using platform-specific features from RTOS.
   
   Currently AOT runs successfully on x86 in a host OS, running these
   tests on micro is coming soon.
   
   This PR is based on the RFC described here: https://discuss.tvm.apache.org/t/implementing-aot-in-tvm/9206
   
   
   Co-authored-by: Christopher Sidebottom <Ch...@arm.com>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] giuseros commented on a change in pull request #7785: [AOT] Introducing AOT in TVM

Posted by GitBox <gi...@apache.org>.

giuseros commented on a change in pull request #7785:
URL: https://github.com/apache/tvm/pull/7785#discussion_r615928298



##########
File path: src/target/source/codegen_c_host.cc
##########
@@ -324,15 +343,20 @@ inline void CodeGenCHost::PrintTernaryCondExpr(const T* op, const char* compare,
 }
 
 runtime::Module BuildCHost(IRModule mod, Target target) {
+  bool is_aot_executor =

Review comment:
       I removed the `is_aot_executor`, but I kept the idea of having the runner function as the last function to be generated, in this way the code generated is a bit more readable. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] areusch commented on a change in pull request #7785: [AOT] Introducing AOT in TVM

Posted by GitBox <gi...@apache.org>.

areusch commented on a change in pull request #7785:
URL: https://github.com/apache/tvm/pull/7785#discussion_r615042963



##########
File path: python/tvm/micro/model_library_format.py
##########
@@ -126,20 +125,25 @@ def export_model_library_format(mod: graph_executor_factory.GraphExecutorFactory
 
     Parameters
     ----------
-    mod : tvm.relay.backend.graph_executor_factory.GraphExecutorFactoryModule
+    mod : tvm.relay.backend.executor_factory.ExecutorFactoryModule
         The return value of tvm.relay.build, which will be exported into Model Library Format.
     file_name : str
         Path to the .tar archive to generate.
     """
     tempdir = utils.tempdir()
+    is_aot = isinstance(mod, executor_factory.AOTExecutorFactoryModule)

Review comment:
       oh i'm sorry, I misread. ignore :)




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] giuseros commented on a change in pull request #7785: [AOT] Introducing AOT in TVM

Posted by GitBox <gi...@apache.org>.

giuseros commented on a change in pull request #7785:
URL: https://github.com/apache/tvm/pull/7785#discussion_r615041000



##########
File path: src/target/source/codegen_c_host.cc
##########
@@ -324,15 +343,20 @@ inline void CodeGenCHost::PrintTernaryCondExpr(const T* op, const char* compare,
 }
 
 runtime::Module BuildCHost(IRModule mod, Target target) {
+  bool is_aot_executor =

Review comment:
       I can try to print prototypes. I remember that when I tried I had some issues, but I can try again. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] manupa-arm commented on pull request #7785: [AOT] Introducing AOT in TVM

Posted by GitBox <gi...@apache.org>.

manupa-arm commented on pull request #7785:
URL: https://github.com/apache/tvm/pull/7785#issuecomment-828430178


   > Hi @areusch , @tqchen , @manupa-arm ,
   > About the target vs build discussion. I think we all agree that the executor should not be part of the target, but it should be a build parameter.
   > 
   > The question is if adding the build parameter now, or if doing it in another PR. The point is that the executor is really only used in `build_module.cc` , so moving it as a build parameter seems the best choice. This also avoids hacky workarounds in situations where the `target_host` is not defined (e.g., `cuda`).
   > 
   > I understand the argument to leave it as a target option now and then move all the target options in an option object in a later PR. But I would prefer to reduce the number of hacks for AOT from day 1, and in a later PR try to uniform crt and link-params to AOT. In other words, if there are no drawbacks, let's try to make AOT the "proper" way and let's then address the other target options to uniform to AOT. Thoughts?
   
   My two cents for this is : for now "executor" being only consumed by relay build, I'd would prefer it be an argument over putting it in the target (between the two options of having it inside the target vs relay.build arg).
   
   I agree with the general direction we need to a way to convey compilation flags that are not strictly associated with target. However, the options described here (runtime, link-params) runs deeper than the relay.build, therefore when those could be refactored into compiler options "object", we could also make the executor part of it. What do others think ?
   (I think what we are discussing is where to put the "executor" until compiler options "object" is introduced)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] giuseros commented on a change in pull request #7785: [AOT] Introducing AOT in TVM

Posted by GitBox <gi...@apache.org>.

giuseros commented on a change in pull request #7785:
URL: https://github.com/apache/tvm/pull/7785#discussion_r623740685



##########
File path: include/tvm/tir/builtin.h
##########
@@ -343,6 +343,16 @@ TVM_DLL const Op& tvm_stack_make_array();
  */
 TVM_DLL const Op& tvm_call_packed();
 
+/*!
+ * \brief See pesudo code
+ *
+ *  int tvm_call_packed(fname, TVMValue* args) {
+ *     (*fname)(args, type_code_of(args), len(args));
+ *     return 0;
+ *  }
+ */
+TVM_DLL const Op& tvm_call_cpacked();

Review comment:
       Given #7932 has been merged, I updated the docstring




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] tkonolige commented on a change in pull request #7785: [AOT] Introducing AOT in TVM

Posted by GitBox <gi...@apache.org>.

tkonolige commented on a change in pull request #7785:
URL: https://github.com/apache/tvm/pull/7785#discussion_r607299180



##########
File path: include/tvm/runtime/crt/aot/tvm_backend.h
##########
@@ -0,0 +1,104 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+/*!
+ * \file include/tvm/runtime/crt/aot/tvm_backend.h
+ * \brief Backend functions for the AOT executor
+ *
+ * These are not designed to user-facing and may change without warning
+ */
+
+#ifndef TVM_RUNTIME_CRT_AOT_TVM_BACKEND_H_
+#define TVM_RUNTIME_CRT_AOT_TVM_BACKEND_H_
+
+#include <stddef.h>
+#include <stdint.h>
+
+#include "tvm_error.h"
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+/*! Memory alignment for allocator */
+#ifndef TVM_RUNTIME_ALLOC_ALIGNMENT
+#define TVM_RUNTIME_ALLOC_ALIGNMENT 16

Review comment:
       Why is this different from `kAllocAlignment`?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] giuseros commented on a change in pull request #7785: [AOT] Introducing AOT in TVM

Posted by GitBox <gi...@apache.org>.

giuseros commented on a change in pull request #7785:
URL: https://github.com/apache/tvm/pull/7785#discussion_r611683529



##########
File path: src/relay/backend/aot_codegen.cc
##########
@@ -0,0 +1,675 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+/*!
+ * \file relay/backend/graph_codegen.cc
+ * \brief Graph runtime codegen
+ */
+
+#include <dmlc/any.h>
+#include <tvm/ir/module.h>
+#include <tvm/relay/expr_functor.h>
+#include <tvm/runtime/device_api.h>
+#include <tvm/tir/builtin.h>
+#include <tvm/tir/expr.h>
+#include <tvm/tir/stmt.h>
+
+#include <algorithm>
+#include <list>
+#include <string>
+#include <vector>
+
+#include "../../runtime/meta_data.h"
+#include "compile_engine.h"
+#include "utils.h"
+
+namespace tvm {
+namespace relay {
+namespace backend {
+
+using IntegerArray = Array<Integer>;
+using ShapeVector = std::vector<std::vector<int64_t>>;
+using GraphAttrs = std::unordered_map<std::string, dmlc::any>;
+using TargetsMap = std::unordered_map<int, Target>;
+
+/*! \brief Lowered outputs */
+struct AOTLoweredOutput {
+  std::string graph_tir;
+  Map<String, IRModule> lowered_funcs;
+  Array<tvm::runtime::Module> external_mods;
+  std::unordered_map<std::string, std::pair<int, const tvm::runtime::NDArray>> params;
+  runtime::AOTMetadata aot_metadata;
+};
+
+class AotReturnSidVisitor : public ExprVisitor {
+ public:
+  explicit AotReturnSidVisitor(Map<Expr, Array<IntegerArray>> storage_device_map)
+      : storage_device_map_{storage_device_map}, return_sid_{-1} {}
+
+  IntegerArray FindReturnSid(Function func) {
+    VisitExpr(func->body);
+    return return_sid_;
+  }
+
+ protected:
+  void AssignReturnSid(Expr e) {
+    auto iter = storage_device_map_.find(e);
+    if (iter != storage_device_map_.end()) {
+      return_sid_ = (*iter).second[0];
+    }
+  }
+
+  void VisitExpr_(const ConstantNode* cn) override {
+    ExprVisitor::VisitExpr_(cn);
+    AssignReturnSid(GetRef<Expr>(cn));
+  }
+
+  void VisitExpr_(const VarNode* vn) override {
+    ExprVisitor::VisitExpr_(vn);
+    AssignReturnSid(GetRef<Expr>(vn));
+  }
+
+  void VisitExpr_(const CallNode* cn) override {
+    ExprVisitor::VisitExpr_(cn);
+    AssignReturnSid(GetRef<Expr>(cn));
+  }
+
+  void VisitExpr_(const LetNode* op) override { VisitExpr(op->body); }
+
+  void VisitExpr_(const TupleNode* tn) override {
+    ExprVisitor::VisitExpr_(tn);
+    AssignReturnSid(GetRef<Expr>(tn));
+  }
+
+ private:
+  Map<Expr, Array<IntegerArray>> storage_device_map_;
+  IntegerArray return_sid_;
+};
+
+using TIRNetwork = tvm::Array<tir::Stmt>;
+
+/*! \brief Code generator for graph runtime */
+class AOTCodegen : public ExprVisitor {
+ protected:
+  /*!
+   * \brief Utility function to allocate a DLTensor or TVMValue
+   * \param  type the type of allocation
+   * \param num the number of variable to allocate on the stack
+   * \return PrimExpr representing the allocated object
+   */
+  PrimExpr StackAlloca(std::string type, size_t num) {
+    Array<PrimExpr> args = {tir::StringImm(type), ConstInt32(num)};
+    return tir::Call(DataType::Handle(), tir::builtin::tvm_stack_alloca(), args);
+  }
+
+  /*!
+   * \brief Utility function to allocate memory for storage identifiers
+   * \param  memory_size_byte size in bytes of the allocation
+   * \return PrimExpr representing the allocated memory
+   */
+  PrimExpr AllocateBackendMemory(int memory_size_byte) {
+    // TODO(giuseros): use tir::Allocate instead of TVMBackendAllocWorkspace
+    // to enable unified memory planning
+    static const Op& op = Op::Get("tir.TVMBackendAllocWorkspace");
+    return tvm::tir::Call(DataType::Handle(), op, {1, 0, memory_size_byte, 2, 8});
+  }
+
+  /*!
+   * \brief Utility function to convert a concrete integer to a PrimExpr.
+   * \param num the number to convert
+   * \return PrimExpr representing num
+   */
+  inline PrimExpr ConstInt32(size_t num) {
+    ICHECK_LE(num, std::numeric_limits<int>::max());
+    return tir::make_const(DataType::Int(32), static_cast<int>(num));
+  }
+
+  /*!
+   * \brief Return a vector of variables that represents the sids for the given Relay Expr
+   */
+  std::vector<tir::Var> pack_sid(Expr expr) {
+    Array<IntegerArray> sids = storage_device_map_[expr];
+    std::vector<tir::Var> sid_vars;
+
+    // Note that an expression can have multiple sids associated with it
+    // e.g., returning multiple values from a function
+    for (const auto& sid : sids[0]) {
+      // Determine if an sid is an output buffer
+      int sid_int = static_cast<int>((sid.as<IntImmNode>())->value);
+      auto output_iter = std::find(return_sid_.begin(), return_sid_.end(), sid_int);
+      if (output_iter != return_sid_.end()) {
+        int output_index = std::distance(return_sid_.begin(), output_iter);
+        sid_vars.push_back(main_signature_[input_vars_.size() + output_index]);
+        continue;
+      }
+      // Pack the sid inside the TVMValue
+      auto sid_array = te::Var(make_string("sid_", sid, "_value"), DataType::Handle());
+      auto sid_value = sids_table_[sid];
+      tvm::PrimExpr set_tensor =
+          tvm::tir::Call(DataType::Handle(), tvm::tir::builtin::tvm_struct_set(),
+                         {sid_array, 0, tir::builtin::kArrData, sid_value});
+      stmts_.push_back(tir::LetStmt(sid_array, StackAlloca("array", 1), tir::Evaluate(set_tensor)));
+      sid_vars.push_back(sid_array);
+    }
+    return sid_vars;
+  }
+
+  /*!
+   * \brief Utility function to return a parameter associated with an expression
+   * \param expr Relay Expression assicated with the parameter
+   * \return Variable that represents the DLTensor associated with the parameters
+   */
+  tir::Var pack_param(Expr expr) {
+    // TODO(giuseros): Using call_extern to call into lookup_linked_param. This is because the
+    // builtin::ret is not supported yet in the c target. Once return is supported we can use
+    // tvm_call_packed_lowered().
+    int param_sid = param_storage_ids_[reverse_params_lookup_[expr]];
+    auto lookup_linked_param_fn = tir::StringImm(::tvm::runtime::symbol::tvm_lookup_linked_param);
+    auto param_array = te::Var(make_string("param_", param_sid, "_array"), DataType::Handle());
+
+    // Compose the lookup_call using a local stack
+    Array<tir::Stmt> lookup_call;
+    auto param_var = te::Var(make_string("param_", param_sid, "_value"), DataType::Handle());
+    auto ret_var = te::Var("ret_value", DataType::Handle());
+    auto ret_code = te::Var("ret_value", DataType::Handle());
+
+    lookup_call.push_back(tir::Evaluate(
+        tvm::tir::Call(DataType::Handle(), tvm::tir::builtin::tvm_struct_set(),
+                       {param_var, 0, tir::builtin::kTVMValueContent, ConstInt32(param_sid)})));
+    lookup_call.push_back(tir::Evaluate(
+        tvm::tir::Call(DataType::Handle(), tir::builtin::call_extern(),
+                       {lookup_linked_param_fn, param_var, 0, 0, ret_var, ret_code, 0})));
+    auto ret_var_handle = tvm::tir::Call(DataType::Handle(), tvm::tir::builtin::tvm_struct_get(),
+                                         {ret_var, 0, tir::builtin::kTVMValueContent});
+
+    // Set the param to the value returned by lookup_call
+    tvm::PrimExpr set_param_array =
+        tvm::tir::Call(DataType::Handle(), tvm::tir::builtin::tvm_struct_set(),
+                       {param_array, 0, tir::builtin::kArrData, ret_var_handle});
+    lookup_call.push_back(tir::Evaluate(set_param_array));
+
+    tir::Stmt lookup_body = tir::SeqStmt(lookup_call);
+
+    // Allocate the DLTensors on the stack
+    lookup_body = tir::LetStmt(param_var, StackAlloca("arg_value", 1), lookup_body);
+    lookup_body = tir::LetStmt(ret_var, StackAlloca("arg_value", 1), lookup_body);
+    lookup_body = tir::LetStmt(ret_code, StackAlloca("arg_value", 1), lookup_body);
+    lookup_body = tir::LetStmt(param_array, StackAlloca("arg_value", 1), lookup_body);
+    stmts_.push_back(lookup_body);
+    return param_array;
+  }
+
+  /*!
+   * brief Given an expression return the variable(s) associated with that expression
+   */
+  std::vector<te::Var> find_expr(Expr arg) {
+    auto input_iter = std::find(input_vars_.begin(), input_vars_.end(), arg);
+    if (input_iter != input_vars_.end()) {
+      // Input variable
+      int main_index = std::distance(input_vars_.begin(), input_iter);
+      return {main_signature_[main_index]};
+    } else if (reverse_params_lookup_.find(arg) != reverse_params_lookup_.end()) {
+      // Parameter of the network
+      return {pack_param(arg)};
+    } else {
+      // Storage identifier (i.e., intermediate memory)
+      return pack_sid(arg);
+    }
+  }
+
+  /*!
+   * brief Call a function with a given name
+   */
+  void func_call(Call call, std::string func_name) {
+    tvm::Array<PrimExpr> args{tvm::tir::StringImm(func_name)};
+    std::vector<tir::Stmt> func_call_stmts;
+
+    // Pack the inputs
+    for (Expr arg : call->args) {
+      auto var_arg = find_expr(arg);
+      args.push_back(var_arg[0]);
+    }
+
+    auto ret_expr = Downcast<Expr>(call);
+
+    // Pack the return(s) value. A call node can produce multiple outputs
+    for (const auto& var : pack_sid(ret_expr)) {
+      args.push_back(var);
+    }
+
+    // Use tvm_call_packed to execute the function
+    func_call_stmts.push_back(tir::Evaluate(
+        tvm::tir::Call(DataType::Int(32), tvm::tir::builtin::tvm_call_packed(), args)));
+    tir::Stmt body = tir::SeqStmt(func_call_stmts);
+    stmts_.push_back(body);
+  }
+
+  /*!
+   * brief Copy a variable to the output. This function is mainly used in edge cases
+   * when we want to return an input or a parameter.
+   */
+  void copy_to_output(te::Var out, te::Var in, size_t size) {
+    auto retval_get = tvm::tir::Call(DataType::Handle(), tvm::tir::builtin::tvm_struct_get(),
+                                     {in, 0, tir::builtin::kArrData});
+
+    // Define intermediate DLTensor to load/store the data
+    auto tmp0 = te::Var("tmp0", DataType::Handle());
+    auto tmp1 = te::Var("tmp1", DataType::Handle());
+    te::Var loop_idx("i", DataType::Int(32));
+    auto retval_i = tir::Load(DataType::UInt(8), tmp0, loop_idx, tir::const_true());
+    auto tostore = tvm::tir::Call(DataType::Handle(), tvm::tir::builtin::tvm_struct_get(),
+                                  {out, 0, tir::builtin::kArrData});
+
+    // Copy the variable from the input to the output
+    tir::Stmt copy = tir::For(
+        loop_idx, 0, ConstInt32(size), tir::ForKind::kSerial,
+        tir::Store(tmp1, tir::Let(tmp0, retval_get, retval_i), loop_idx, tir::const_true()));
+    stmts_.push_back(tir::LetStmt(tmp1, tostore, copy));
+  }
+
+  /*!
+   * Utility function to string together different arguments
+   */
+  template <typename... Args>
+  std::string make_string(Args const&... args) {
+    std::ostringstream ss;
+    using List = int[];
+    (void)List{0, ((void)(ss << args), 0)...};
+
+    return ss.str();
+  }
+
+  void VisitExpr_(const CallNode* op) override {
+    // Descend the call tree
+    for (auto arg : op->args) {
+      VisitExpr(arg);
+    }
+
+    Expr expr = GetRef<Expr>(op);
+    Function func;
+    if (op->op.as<OpNode>()) {
+      LOG(FATAL) << "Operators should be transformed away; try applying"
+                 << "the fuse_ops transformation to the expression.";
+    } else if (op->op.as<GlobalVarNode>()) {
+      LOG(FATAL) << "Not implemented";
+    } else if (op->op.as<FunctionNode>()) {
+      func = GetRef<Function>(op->op.as<FunctionNode>());
+    } else {
+      LOG(FATAL) << "TVM runtime does not support calls to " << op->op->GetTypeKey();
+    }
+    if (!func->HasNonzeroAttr(attr::kPrimitive)) {
+      LOG(FATAL) << "TVM only support calls to primitive functions "
+                 << "(i.e functions composed of fusable operator invocations)";
+    }
+
+    auto pf0 = GetPackedFunc("relay.backend._make_CCacheKey");
+    auto pf1 = GetPackedFunc("relay.backend._CompileEngineLower");
+    Target target;
+    // Handle external function
+    if (func->GetAttr<String>(attr::kCompiler).defined()) {
+      target = Target("ext_dev");
+      CCacheKey key = (*pf0)(func, target);
+      CachedFunc ext_func = (*pf1)(compile_engine_, key);
+      ICHECK(ext_func.defined()) << "External function is not defined.";
+      UpdateConstants(func, &params_);
+
+      // Generate the TIR function call
+      func_call(GetRef<Call>(op), ext_func->func_name);
+    }
+
+    ICHECK_GE(storage_device_map_.count(expr), 0);
+    auto& device_type = storage_device_map_[expr][1];
+    auto call_dev_type = device_type[0]->value;
+    // Normal Relay Function
+    if (targets_.size() == 1) {
+      // homogeneous execution.
+      const auto& it = targets_.begin();
+      target = (*it).second;
+    } else {
+      // heterogeneous execution.
+      std::string call_dev_name;
+      if (call_dev_type == 0) {
+        call_dev_name = "llvm";
+      } else {
+        call_dev_name = runtime::DeviceName(call_dev_type);
+      }
+      if (targets_.count(call_dev_type) == 0) {
+        LOG(FATAL) << "No target is provided for device " << call_dev_name;
+      }
+      target = targets_[call_dev_type];
+    }
+    CCacheKey key = (*pf0)(func, target);
+    CachedFunc lowered_func = (*pf1)(compile_engine_, key);
+    if (!lowered_funcs_.count(target->str())) {
+      lowered_funcs_[target->str()] = IRModule(Map<GlobalVar, BaseFunc>({}));
+    }
+    lowered_funcs_[target->str()]->Update(lowered_func->funcs);
+
+    // Generate the TIR function call
+    func_call(GetRef<Call>(op), lowered_func->func_name);
+  }
+
+  void VisitExpr_(const VarNode* op) override {
+    Expr expr = GetRef<Expr>(op);
+
+    // If the Var node is an output node we need to copy the content of the variable to the output
+    // A Var node can only produce a single output

Review comment:
       Got it. Thanks




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] manupa-arm commented on a change in pull request #7785: [AOT] Introducing AOT in TVM

Posted by GitBox <gi...@apache.org>.

manupa-arm commented on a change in pull request #7785:
URL: https://github.com/apache/tvm/pull/7785#discussion_r617637024



##########
File path: src/relay/backend/build_module.cc
##########
@@ -473,23 +517,25 @@ class RelayBuildModule : public runtime::ModuleNode {
 
     // Relay IRModule -> IRModule optimizations.
     relay_module = Optimize(relay_module, targets_, params);
+
     // Get the updated function.
     auto func = Downcast<Function>(relay_module->Lookup("main"));
 
     // Generate code for the updated function.
-    graph_codegen_ = std::unique_ptr<GraphCodegen>(new GraphCodegen());
-    graph_codegen_->Init(nullptr, targets_);
-    graph_codegen_->Codegen(func);
-
-    ret_.graph_json = graph_codegen_->GetJSON();
-    ret_.params = graph_codegen_->GetParams();
+    const String executor_str =
+        target_host->GetAttr<String>("executor").value_or(kTvmExecutorGraph);

Review comment:
       I think "executor" should technically not be attribute of the target_host rather a parameter for relay.build(...). I think we should just plumb it through relay.build(.., executor=graph) while defaulting to graph -- thus its not a breaking change. 
   cc : @areusch 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] giuseros commented on a change in pull request #7785: [AOT] Introducing AOT in TVM

Posted by GitBox <gi...@apache.org>.

giuseros commented on a change in pull request #7785:
URL: https://github.com/apache/tvm/pull/7785#discussion_r610703811



##########
File path: include/tvm/runtime/crt/aot/tvm_backend.h
##########
@@ -0,0 +1,104 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+/*!
+ * \file include/tvm/runtime/crt/aot/tvm_backend.h
+ * \brief Backend functions for the AOT executor
+ *
+ * These are not designed to user-facing and may change without warning
+ */
+
+#ifndef TVM_RUNTIME_CRT_AOT_TVM_BACKEND_H_

Review comment:
       So, this was a controversial subject for us. The main issue we had with reusing the existing `crt` was that we are aiming to have a more friendly code generation process targeting embedded processors. In particular, we have issues with:
   * TVMBackendMemoryAllocWorkspace to take an uint64_t to specify the size
   * TVMValue to be a 64 bit datastructure
   * Dependency on DLTensor and DLDevice
   However, TVMValue and DLTensor will be removed with the next few patches and TVMBackendMemoryAllocWorkspace will be the subject of global memory unification. So we refactored the code to fit in `crt` but I wanted just to flag that this brings a lot of baggage that for embedded devices is not essential




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] areusch commented on a change in pull request #7785: [AOT] Introducing AOT in TVM

Posted by GitBox <gi...@apache.org>.

areusch commented on a change in pull request #7785:
URL: https://github.com/apache/tvm/pull/7785#discussion_r615006631



##########
File path: src/target/source/codegen_c_host.cc
##########
@@ -40,13 +40,16 @@ namespace codegen {
 
 CodeGenCHost::CodeGenCHost() { module_name_ = GetUniqueName("__tvm_module_ctx"); }
 
-void CodeGenCHost::Init(bool output_ssa, bool emit_asserts, std::string target_str) {
+void CodeGenCHost::Init(bool output_ssa, bool emit_asserts, bool is_aot_executor,
+                        std::string target_str) {

Review comment:
       @giuseros great, I support that so sounds good!




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] giuseros edited a comment on pull request #7785: [AOT] Introducing AOT in TVM

Posted by GitBox <gi...@apache.org>.

giuseros edited a comment on pull request #7785:
URL: https://github.com/apache/tvm/pull/7785#issuecomment-828639039


   Hi @areusch , 
   So, about solution 2) , it is also not affecting the logs, I think. Actually, it would be  transparent to the logs because we are not touching the target. 
   
   A major drawback of 1) is about `tvmc`. If we go for 1), the user needs to specify the executor directly in the target via `tvmc`. When we remove it from the target, also the `tvmc` command will have to change. 
   
   With 2), we can have a flag in `tvmc`, e.g., `--executor` and this will translate to whatever `relay.build` command we decide. When we change `relay.build` the `tvmc` command won't change. 
   
   Also, in general, why adding something to the target if we know we want to remove it entirely  (and when it is so straightforward to express as  a `relay.build` parameter)?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] areusch commented on a change in pull request #7785: [AOT] Introducing AOT in TVM

Posted by GitBox <gi...@apache.org>.

areusch commented on a change in pull request #7785:
URL: https://github.com/apache/tvm/pull/7785#discussion_r608908551



##########
File path: src/relay/backend/aot_codegen.cc
##########
@@ -0,0 +1,675 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+/*!
+ * \file relay/backend/graph_codegen.cc
+ * \brief Graph runtime codegen
+ */
+
+#include <dmlc/any.h>
+#include <tvm/ir/module.h>
+#include <tvm/relay/expr_functor.h>
+#include <tvm/runtime/device_api.h>
+#include <tvm/tir/builtin.h>
+#include <tvm/tir/expr.h>
+#include <tvm/tir/stmt.h>
+
+#include <algorithm>
+#include <list>
+#include <string>
+#include <vector>
+
+#include "../../runtime/meta_data.h"
+#include "compile_engine.h"
+#include "utils.h"
+
+namespace tvm {
+namespace relay {
+namespace backend {
+
+using IntegerArray = Array<Integer>;
+using ShapeVector = std::vector<std::vector<int64_t>>;
+using GraphAttrs = std::unordered_map<std::string, dmlc::any>;
+using TargetsMap = std::unordered_map<int, Target>;
+
+/*! \brief Lowered outputs */
+struct AOTLoweredOutput {
+  std::string graph_tir;
+  Map<String, IRModule> lowered_funcs;
+  Array<tvm::runtime::Module> external_mods;
+  std::unordered_map<std::string, std::pair<int, const tvm::runtime::NDArray>> params;
+  runtime::AOTMetadata aot_metadata;
+};
+
+class AotReturnSidVisitor : public ExprVisitor {
+ public:
+  explicit AotReturnSidVisitor(Map<Expr, Array<IntegerArray>> storage_device_map)
+      : storage_device_map_{storage_device_map}, return_sid_{-1} {}
+
+  IntegerArray FindReturnSid(Function func) {
+    VisitExpr(func->body);
+    return return_sid_;
+  }
+
+ protected:
+  void AssignReturnSid(Expr e) {
+    auto iter = storage_device_map_.find(e);
+    if (iter != storage_device_map_.end()) {
+      return_sid_ = (*iter).second[0];
+    }
+  }
+
+  void VisitExpr_(const ConstantNode* cn) override {
+    ExprVisitor::VisitExpr_(cn);
+    AssignReturnSid(GetRef<Expr>(cn));
+  }
+
+  void VisitExpr_(const VarNode* vn) override {
+    ExprVisitor::VisitExpr_(vn);
+    AssignReturnSid(GetRef<Expr>(vn));
+  }
+
+  void VisitExpr_(const CallNode* cn) override {
+    ExprVisitor::VisitExpr_(cn);
+    AssignReturnSid(GetRef<Expr>(cn));
+  }
+
+  void VisitExpr_(const LetNode* op) override { VisitExpr(op->body); }
+
+  void VisitExpr_(const TupleNode* tn) override {
+    ExprVisitor::VisitExpr_(tn);
+    AssignReturnSid(GetRef<Expr>(tn));
+  }
+
+ private:
+  Map<Expr, Array<IntegerArray>> storage_device_map_;
+  IntegerArray return_sid_;
+};
+
+using TIRNetwork = tvm::Array<tir::Stmt>;
+
+/*! \brief Code generator for graph runtime */
+class AOTCodegen : public ExprVisitor {
+ protected:
+  /*!
+   * \brief Utility function to allocate a DLTensor or TVMValue
+   * \param  type the type of allocation
+   * \param num the number of variable to allocate on the stack
+   * \return PrimExpr representing the allocated object
+   */
+  PrimExpr StackAlloca(std::string type, size_t num) {
+    Array<PrimExpr> args = {tir::StringImm(type), ConstInt32(num)};
+    return tir::Call(DataType::Handle(), tir::builtin::tvm_stack_alloca(), args);
+  }
+
+  /*!
+   * \brief Utility function to allocate memory for storage identifiers
+   * \param  memory_size_byte size in bytes of the allocation
+   * \return PrimExpr representing the allocated memory
+   */
+  PrimExpr AllocateBackendMemory(int memory_size_byte) {
+    // TODO(giuseros): use tir::Allocate instead of TVMBackendAllocWorkspace
+    // to enable unified memory planning
+    static const Op& op = Op::Get("tir.TVMBackendAllocWorkspace");
+    return tvm::tir::Call(DataType::Handle(), op, {1, 0, memory_size_byte, 2, 8});
+  }
+
+  /*!
+   * \brief Utility function to convert a concrete integer to a PrimExpr.
+   * \param num the number to convert
+   * \return PrimExpr representing num
+   */
+  inline PrimExpr ConstInt32(size_t num) {
+    ICHECK_LE(num, std::numeric_limits<int>::max());
+    return tir::make_const(DataType::Int(32), static_cast<int>(num));
+  }
+
+  /*!
+   * \brief Return a vector of variables that represents the sids for the given Relay Expr
+   */
+  std::vector<tir::Var> pack_sid(Expr expr) {
+    Array<IntegerArray> sids = storage_device_map_[expr];
+    std::vector<tir::Var> sid_vars;
+
+    // Note that an expression can have multiple sids associated with it
+    // e.g., returning multiple values from a function
+    for (const auto& sid : sids[0]) {
+      // Determine if an sid is an output buffer
+      int sid_int = static_cast<int>((sid.as<IntImmNode>())->value);
+      auto output_iter = std::find(return_sid_.begin(), return_sid_.end(), sid_int);
+      if (output_iter != return_sid_.end()) {
+        int output_index = std::distance(return_sid_.begin(), output_iter);
+        sid_vars.push_back(main_signature_[input_vars_.size() + output_index]);
+        continue;
+      }
+      // Pack the sid inside the TVMValue
+      auto sid_array = te::Var(make_string("sid_", sid, "_value"), DataType::Handle());
+      auto sid_value = sids_table_[sid];
+      tvm::PrimExpr set_tensor =
+          tvm::tir::Call(DataType::Handle(), tvm::tir::builtin::tvm_struct_set(),
+                         {sid_array, 0, tir::builtin::kArrData, sid_value});
+      stmts_.push_back(tir::LetStmt(sid_array, StackAlloca("array", 1), tir::Evaluate(set_tensor)));
+      sid_vars.push_back(sid_array);
+    }
+    return sid_vars;
+  }
+
+  /*!
+   * \brief Utility function to return a parameter associated with an expression
+   * \param expr Relay Expression assicated with the parameter
+   * \return Variable that represents the DLTensor associated with the parameters
+   */
+  tir::Var pack_param(Expr expr) {
+    // TODO(giuseros): Using call_extern to call into lookup_linked_param. This is because the
+    // builtin::ret is not supported yet in the c target. Once return is supported we can use
+    // tvm_call_packed_lowered().
+    int param_sid = param_storage_ids_[reverse_params_lookup_[expr]];
+    auto lookup_linked_param_fn = tir::StringImm(::tvm::runtime::symbol::tvm_lookup_linked_param);
+    auto param_array = te::Var(make_string("param_", param_sid, "_array"), DataType::Handle());
+
+    // Compose the lookup_call using a local stack
+    Array<tir::Stmt> lookup_call;
+    auto param_var = te::Var(make_string("param_", param_sid, "_value"), DataType::Handle());
+    auto ret_var = te::Var("ret_value", DataType::Handle());
+    auto ret_code = te::Var("ret_value", DataType::Handle());
+
+    lookup_call.push_back(tir::Evaluate(
+        tvm::tir::Call(DataType::Handle(), tvm::tir::builtin::tvm_struct_set(),
+                       {param_var, 0, tir::builtin::kTVMValueContent, ConstInt32(param_sid)})));
+    lookup_call.push_back(tir::Evaluate(
+        tvm::tir::Call(DataType::Handle(), tir::builtin::call_extern(),
+                       {lookup_linked_param_fn, param_var, 0, 0, ret_var, ret_code, 0})));
+    auto ret_var_handle = tvm::tir::Call(DataType::Handle(), tvm::tir::builtin::tvm_struct_get(),
+                                         {ret_var, 0, tir::builtin::kTVMValueContent});
+
+    // Set the param to the value returned by lookup_call
+    tvm::PrimExpr set_param_array =
+        tvm::tir::Call(DataType::Handle(), tvm::tir::builtin::tvm_struct_set(),
+                       {param_array, 0, tir::builtin::kArrData, ret_var_handle});
+    lookup_call.push_back(tir::Evaluate(set_param_array));
+
+    tir::Stmt lookup_body = tir::SeqStmt(lookup_call);
+
+    // Allocate the DLTensors on the stack
+    lookup_body = tir::LetStmt(param_var, StackAlloca("arg_value", 1), lookup_body);
+    lookup_body = tir::LetStmt(ret_var, StackAlloca("arg_value", 1), lookup_body);
+    lookup_body = tir::LetStmt(ret_code, StackAlloca("arg_value", 1), lookup_body);
+    lookup_body = tir::LetStmt(param_array, StackAlloca("arg_value", 1), lookup_body);
+    stmts_.push_back(lookup_body);
+    return param_array;
+  }
+
+  /*!
+   * brief Given an expression return the variable(s) associated with that expression
+   */
+  std::vector<te::Var> find_expr(Expr arg) {
+    auto input_iter = std::find(input_vars_.begin(), input_vars_.end(), arg);
+    if (input_iter != input_vars_.end()) {
+      // Input variable
+      int main_index = std::distance(input_vars_.begin(), input_iter);
+      return {main_signature_[main_index]};
+    } else if (reverse_params_lookup_.find(arg) != reverse_params_lookup_.end()) {
+      // Parameter of the network
+      return {pack_param(arg)};
+    } else {
+      // Storage identifier (i.e., intermediate memory)
+      return pack_sid(arg);
+    }
+  }
+
+  /*!
+   * brief Call a function with a given name
+   */
+  void func_call(Call call, std::string func_name) {
+    tvm::Array<PrimExpr> args{tvm::tir::StringImm(func_name)};
+    std::vector<tir::Stmt> func_call_stmts;
+
+    // Pack the inputs
+    for (Expr arg : call->args) {
+      auto var_arg = find_expr(arg);
+      args.push_back(var_arg[0]);
+    }
+
+    auto ret_expr = Downcast<Expr>(call);
+
+    // Pack the return(s) value. A call node can produce multiple outputs
+    for (const auto& var : pack_sid(ret_expr)) {
+      args.push_back(var);
+    }
+
+    // Use tvm_call_packed to execute the function
+    func_call_stmts.push_back(tir::Evaluate(
+        tvm::tir::Call(DataType::Int(32), tvm::tir::builtin::tvm_call_packed(), args)));
+    tir::Stmt body = tir::SeqStmt(func_call_stmts);
+    stmts_.push_back(body);
+  }
+
+  /*!
+   * brief Copy a variable to the output. This function is mainly used in edge cases
+   * when we want to return an input or a parameter.
+   */
+  void copy_to_output(te::Var out, te::Var in, size_t size) {
+    auto retval_get = tvm::tir::Call(DataType::Handle(), tvm::tir::builtin::tvm_struct_get(),
+                                     {in, 0, tir::builtin::kArrData});
+
+    // Define intermediate DLTensor to load/store the data
+    auto tmp0 = te::Var("tmp0", DataType::Handle());
+    auto tmp1 = te::Var("tmp1", DataType::Handle());
+    te::Var loop_idx("i", DataType::Int(32));
+    auto retval_i = tir::Load(DataType::UInt(8), tmp0, loop_idx, tir::const_true());
+    auto tostore = tvm::tir::Call(DataType::Handle(), tvm::tir::builtin::tvm_struct_get(),
+                                  {out, 0, tir::builtin::kArrData});
+
+    // Copy the variable from the input to the output
+    tir::Stmt copy = tir::For(
+        loop_idx, 0, ConstInt32(size), tir::ForKind::kSerial,
+        tir::Store(tmp1, tir::Let(tmp0, retval_get, retval_i), loop_idx, tir::const_true()));
+    stmts_.push_back(tir::LetStmt(tmp1, tostore, copy));
+  }
+
+  /*!
+   * Utility function to string together different arguments
+   */
+  template <typename... Args>
+  std::string make_string(Args const&... args) {
+    std::ostringstream ss;
+    using List = int[];
+    (void)List{0, ((void)(ss << args), 0)...};
+
+    return ss.str();
+  }
+
+  void VisitExpr_(const CallNode* op) override {
+    // Descend the call tree
+    for (auto arg : op->args) {
+      VisitExpr(arg);
+    }
+
+    Expr expr = GetRef<Expr>(op);
+    Function func;
+    if (op->op.as<OpNode>()) {
+      LOG(FATAL) << "Operators should be transformed away; try applying"
+                 << "the fuse_ops transformation to the expression.";
+    } else if (op->op.as<GlobalVarNode>()) {
+      LOG(FATAL) << "Not implemented";
+    } else if (op->op.as<FunctionNode>()) {
+      func = GetRef<Function>(op->op.as<FunctionNode>());
+    } else {
+      LOG(FATAL) << "TVM runtime does not support calls to " << op->op->GetTypeKey();
+    }
+    if (!func->HasNonzeroAttr(attr::kPrimitive)) {
+      LOG(FATAL) << "TVM only support calls to primitive functions "
+                 << "(i.e functions composed of fusable operator invocations)";
+    }
+
+    auto pf0 = GetPackedFunc("relay.backend._make_CCacheKey");
+    auto pf1 = GetPackedFunc("relay.backend._CompileEngineLower");
+    Target target;
+    // Handle external function
+    if (func->GetAttr<String>(attr::kCompiler).defined()) {
+      target = Target("ext_dev");
+      CCacheKey key = (*pf0)(func, target);
+      CachedFunc ext_func = (*pf1)(compile_engine_, key);
+      ICHECK(ext_func.defined()) << "External function is not defined.";
+      UpdateConstants(func, &params_);
+
+      // Generate the TIR function call
+      func_call(GetRef<Call>(op), ext_func->func_name);
+    }
+
+    ICHECK_GE(storage_device_map_.count(expr), 0);
+    auto& device_type = storage_device_map_[expr][1];
+    auto call_dev_type = device_type[0]->value;
+    // Normal Relay Function
+    if (targets_.size() == 1) {
+      // homogeneous execution.
+      const auto& it = targets_.begin();
+      target = (*it).second;
+    } else {
+      // heterogeneous execution.
+      std::string call_dev_name;
+      if (call_dev_type == 0) {
+        call_dev_name = "llvm";
+      } else {
+        call_dev_name = runtime::DeviceName(call_dev_type);
+      }
+      if (targets_.count(call_dev_type) == 0) {
+        LOG(FATAL) << "No target is provided for device " << call_dev_name;
+      }
+      target = targets_[call_dev_type];
+    }
+    CCacheKey key = (*pf0)(func, target);
+    CachedFunc lowered_func = (*pf1)(compile_engine_, key);
+    if (!lowered_funcs_.count(target->str())) {
+      lowered_funcs_[target->str()] = IRModule(Map<GlobalVar, BaseFunc>({}));
+    }
+    lowered_funcs_[target->str()]->Update(lowered_func->funcs);
+
+    // Generate the TIR function call
+    func_call(GetRef<Call>(op), lowered_func->func_name);
+  }
+
+  void VisitExpr_(const VarNode* op) override {
+    Expr expr = GetRef<Expr>(op);
+
+    // If the Var node is an output node we need to copy the content of the variable to the output
+    // A Var node can only produce a single output
+    Array<IntegerArray> sids = storage_device_map_[expr];
+    auto output_iter = std::find(return_sid_.begin(), return_sid_.end(),
+                                 static_cast<int>((sids[0][0].as<IntImmNode>())->value));
+    if (output_iter != return_sid_.end()) {
+      int output_index = std::distance(return_sid_.begin(), output_iter);
+      auto var_expr = find_expr(expr);
+      copy_to_output(main_signature_[input_vars_.size() + output_index], var_expr[0], sids[2][0]);
+    }
+  }
+
+  void VisitExpr_(const ConstantNode* op) override {
+    Expr expr = GetRef<Expr>(op);
+    size_t index = params_.size();
+    std::string name = "p" + std::to_string(index);
+
+    param_storage_ids_[name] = storage_device_map_[expr][0][0]->value;
+    params_[name] = op->data;
+    reverse_params_lookup_.Set(expr, name);
+
+    // If the Constant node is an output node we need to copy the content of the parameter to the
+    // output A Var node can only produce a single output
+    Array<IntegerArray> sids = storage_device_map_[expr];
+    auto output_iter = std::find(return_sid_.begin(), return_sid_.end(),
+                                 static_cast<int>((sids[0][0].as<IntImmNode>())->value));
+    if (output_iter != return_sid_.end()) {
+      int output_index = std::distance(return_sid_.begin(), output_iter);
+      copy_to_output(main_signature_[input_vars_.size() + output_index], pack_param(expr),
+                     sids[2][0]);
+    }
+  }
+
+  void VisitExpr_(const TupleNode* op) override {
+    for (auto field : op->fields) {
+      VisitExpr(field);
+    }
+  }
+
+  void VisitExpr_(const LetNode* op) override {
+    // TODO(giuseros): support Let nodes in AOT
+    throw std::invalid_argument("Let not yet implemented in AOT");
+  }
+  void VisitExpr_(const TupleGetItemNode* op) override { VisitExpr(op->tuple); }
+  void VisitExpr_(const OpNode* op) override {
+    throw std::runtime_error("can not compile op in non-eta expanded form");
+  }
+  void VisitExpr_(const GlobalVarNode* op) override { throw std::runtime_error(""); }
+  void VisitExpr_(const IfNode* op) override { throw std::invalid_argument("if not supported"); }
+  void VisitExpr_(const FunctionNode* op) override {
+    ICHECK(op->GetAttr<String>(attr::kCompiler).defined())
+        << "Only functions supported by custom codegen";
+  }
+  void VisitExpr_(const RefCreateNode* op) override {
+    throw std::invalid_argument("reference not supported");
+  }
+  void VisitExpr_(const RefReadNode* op) override {
+    throw std::invalid_argument("reference not supported");
+  }
+  void VisitExpr_(const RefWriteNode* op) override {
+    throw std::invalid_argument("reference not supported");
+  }
+  void VisitExpr_(const ConstructorNode* op) override {
+    throw std::invalid_argument("ADT constructor case not yet implemented");
+  }
+  void VisitExpr_(const MatchNode* op) override {
+    throw std::invalid_argument("match case not yet implemented");
+  }
+
+  // Create the main PrimFunc to execute the graph
+  tir::PrimFunc CreateMainFunc(unsigned int relay_params) {
+    tir::Stmt body = tir::SeqStmt(stmts_);
+
+    // Allocate the sids
+    std::unordered_map<int, bool> allocated;
+
+    for (auto kv : storage_device_map_) {
+      // Only allocate sids that are needed
+      const bool is_input =
+          (std::find(input_vars_.begin(), input_vars_.end(), kv.first) != input_vars_.end());
+      const bool is_param = (reverse_params_lookup_.find(kv.first) != reverse_params_lookup_.end());
+      if (is_input || is_param) {
+        continue;
+      }
+
+      for (unsigned int i = 0; i < kv.second[0].size(); i++) {
+        int size = kv.second[2][i];
+        int sid = static_cast<int>((kv.second[0][i].as<IntImmNode>())->value);
+
+        if (std::find(return_sid_.begin(), return_sid_.end(), sid) != return_sid_.end()) {
+          continue;
+        }
+
+        if (!allocated[sid]) {
+          body = tir::LetStmt(sids_table_[sid], AllocateBackendMemory(size), body);

Review comment:
       I think to do that, we'd need to create a new function and a data structure which outlives the life cycle of the function. not sure if this is fully-supported in TIR right now--if not, can be a todo.
   
   cc @ZihengJiang 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] areusch commented on a change in pull request #7785: [AOT] Introducing AOT in TVM

Posted by GitBox <gi...@apache.org>.

areusch commented on a change in pull request #7785:
URL: https://github.com/apache/tvm/pull/7785#discussion_r619326998



##########
File path: include/tvm/runtime/crt/stack_allocator.h
##########
@@ -0,0 +1,58 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+// LINT_C_FILE
+#ifndef TVM_RUNTIME_CRT_STACK_ALLOCATOR_H_
+#define TVM_RUNTIME_CRT_STACK_ALLOCATOR_H_
+#include <stddef.h>
+#include <stdint.h>
+
+#include "error_codes.h"
+
+#define STACK_ALLOCATOR_TAG 0xabcd1234
+#define STACK_ALLOCATOR_TAG_SIZE_BYTES 4
+
+/*! Memory alignment for allocator */
+
+#ifndef TVM_RUNTIME_ALLOC_ALIGNMENT_BYTES
+#define TVM_RUNTIME_ALLOC_ALIGNMENT_BYTES 16

Review comment:
       why 16? 

##########
File path: src/runtime/crt/memory/stack_allocator.c
##########
@@ -36,6 +41,11 @@ void* StackMemoryManager_Allocate(tvm_workspace_t* tvm_runtime_workspace, int32_
 }
 
 tvm_crt_error_t StackMemoryManager_Free(tvm_workspace_t* tvm_runtime_workspace, void* ptr) {
+#ifdef TVM_CRT_DEBUG
+  uint32_t tag = *(((uint32_t*)tvm_runtime_workspace->next_alloc) - 1);
+  uint32_t total_size = (tvm_runtime_workspace->next_alloc - (uint8_t*)ptr);
+  CHECK_EQ(tag, total_size ^ STACK_ALLOCATOR_TAG, "tag did not match");

Review comment:
       can you make a more informative error message, since the user is not likely to understand what "tag" means if this condition is not met. also probably good to include `ptr` and maybe `next_alloc`.

##########
File path: src/runtime/crt/memory/stack_allocator.c
##########
@@ -16,17 +16,22 @@
  * specific language governing permissions and limitations
  * under the License.
  */
-
 // LINT_C_FILE
-
 #include <tvm/runtime/crt/stack_allocator.h>
+#ifdef TVM_CRT_DEBUG
+#include <tvm/runtime/crt/logging.h>
+#endif
 
 void* StackMemoryManager_Allocate(tvm_workspace_t* tvm_runtime_workspace, int32_t nbytes) {
   uint32_t offset_bytes = (~nbytes + 1) & (TVM_RUNTIME_ALLOC_ALIGNMENT_BYTES - 1);

Review comment:
       I think this is a bit confusing to read since you use bitwise arithmetic with a negative number--can we write it as:
   
   ```
   // reserve bytes at the end of the allocation such that next_alloc % TVM_RUNTIME_ALLOC_BYTES == 0.
   uint32_t total_size_bytes = (nbytes + (TVM_RUNTIME_ALLOC_BYTES - 1)) & (TVM_RUNTIME_ALLOC_BYTES - 1);
   ```
   

##########
File path: src/runtime/crt/memory/stack_allocator.c
##########
@@ -16,17 +16,22 @@
  * specific language governing permissions and limitations
  * under the License.
  */
-
 // LINT_C_FILE
-
 #include <tvm/runtime/crt/stack_allocator.h>
+#ifdef TVM_CRT_DEBUG
+#include <tvm/runtime/crt/logging.h>
+#endif
 
 void* StackMemoryManager_Allocate(tvm_workspace_t* tvm_runtime_workspace, int32_t nbytes) {
   uint32_t offset_bytes = (~nbytes + 1) & (TVM_RUNTIME_ALLOC_ALIGNMENT_BYTES - 1);
   uint8_t* current_alloc = tvm_runtime_workspace->next_alloc;
   uint8_t* next_alloc = tvm_runtime_workspace->next_alloc + nbytes + offset_bytes;
   uint8_t* workspace_end = tvm_runtime_workspace->workspace + tvm_runtime_workspace->workspace_size;
-
+#ifdef TVM_CRT_DEBUG

Review comment:
       I can see why this is not strictly production code, but I don't think that this should be on only in debug mode. it's pretty easy to get memory corruption in a production system, so i'd propose this is just always-on or disable-able by `#define`.

##########
File path: python/tvm/relay/backend/executor_factory.py
##########
@@ -14,21 +14,107 @@
 # KIND, either express or implied.  See the License for the
 # specific language governing permissions and limitations
 # under the License.
-"""Graph executor factory."""
+"""Executor factory modules."""
+from abc import abstractmethod
 import warnings
+
 from ..._ffi.base import string_types
 from ..._ffi.registry import get_global_func
 from ...runtime import ndarray
 
 
-class GraphExecutorFactoryModule:
+class ExecutorFactoryModule:
+    """Common interface for executor factory modules
+    This class describes the common API of different
+    factory modules
+    """
+
+    @abstractmethod
+    def get_excecutor_config(self):
+        """Common function to return the internal representation
+        the executor relies upon to execute the network
+        """
+        raise NotImplementedError
+
+    @abstractmethod
+    def get_params(self):
+        """
+        Sometimes we want to get params explicitly.

Review comment:
       """Return the compiled parameters."""
   
   these are used in the "typical" case where we execute directly in TVM, so I wouldn't use language here that makes it seems like this is an occasionally-used function.

##########
File path: src/tir/transforms/lower_tvm_builtin.cc
##########
@@ -184,7 +184,7 @@ class BuiltinLower : public StmtExprMutator {
   }
   PrimExpr VisitExpr_(const CallNode* op) final {
     if (op->op.same_as(builtin::tvm_call_packed())) {
-      return MakeCallPacked(op, true);
+      return MakeCallPacked(op, /* use_string_lookup */ true);
     } else if (op->op.same_as(builtin::tvm_call_cpacked())) {
       return MakeCallPacked(op, false);

Review comment:
       can you add the `/* use_string_lookup */` here too?

##########
File path: tests/python/relay/test_backend_graph_executor.py
##########
@@ -148,6 +148,12 @@ def test_plan_memory():
     assert len(device_types) == 1
     assert len(storage_sizes) == 4
 
+    # Check the specific size of each sid

Review comment:
       might be better to assert on storage_sizes all at once, would provide a better failure message in CI

##########
File path: python/tvm/relay/backend/executor_factory.py
##########
@@ -14,21 +14,125 @@
 # KIND, either express or implied.  See the License for the
 # specific language governing permissions and limitations
 # under the License.
-"""Graph executor factory."""
+"""Executor factory modules."""
+from abc import abstractmethod
 import warnings
+
+from tvm import tir
+
 from ..._ffi.base import string_types
 from ..._ffi.registry import get_global_func
 from ...runtime import ndarray
 
 
-class GraphExecutorFactoryModule:
+class ExecutorFactoryModule:
+    """Common interface for executor factory modules
+    This class describes the common API of different
+    factory modules
+    """
+
+    @abstractmethod
+    def get_internal_repr(self):
+        """Common function to return the internal representation

Review comment:
       here I mean that there should be a one-line summary followed by further description if needed. also, please document return types in the docstrings here.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] giuseros commented on pull request #7785: [AOT] Introducing AOT in TVM

Posted by GitBox <gi...@apache.org>.

giuseros commented on pull request #7785:
URL: https://github.com/apache/tvm/pull/7785#issuecomment-825989178


   Hi @areusch , 
   I  addressed most of the comments except the stack_allocator. Let's discuss a bit more on the thread here


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] manupa-arm commented on a change in pull request #7785: [AOT] Introducing AOT in TVM

Posted by GitBox <gi...@apache.org>.

manupa-arm commented on a change in pull request #7785:
URL: https://github.com/apache/tvm/pull/7785#discussion_r620080936



##########
File path: src/runtime/crt/memory/stack_allocator.c
##########
@@ -16,17 +16,22 @@
  * specific language governing permissions and limitations
  * under the License.
  */
-
 // LINT_C_FILE
-
 #include <tvm/runtime/crt/stack_allocator.h>
+#ifdef TVM_CRT_DEBUG
+#include <tvm/runtime/crt/logging.h>
+#endif
 
 void* StackMemoryManager_Allocate(tvm_workspace_t* tvm_runtime_workspace, int32_t nbytes) {
   uint32_t offset_bytes = (~nbytes + 1) & (TVM_RUNTIME_ALLOC_ALIGNMENT_BYTES - 1);
   uint8_t* current_alloc = tvm_runtime_workspace->next_alloc;
   uint8_t* next_alloc = tvm_runtime_workspace->next_alloc + nbytes + offset_bytes;
   uint8_t* workspace_end = tvm_runtime_workspace->workspace + tvm_runtime_workspace->workspace_size;
-
+#ifdef TVM_CRT_DEBUG

Review comment:
       Cool! I agree that alignment should be considered in a higher-level memory planning work. :). 
   However, as shown here, its trivial to support a global alignment (that addressed produced by the allocator is aligned to a global granularity) might be beneficial in the mean time.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] giuseros commented on a change in pull request #7785: [AOT] Introducing AOT in TVM

Posted by GitBox <gi...@apache.org>.

giuseros commented on a change in pull request #7785:
URL: https://github.com/apache/tvm/pull/7785#discussion_r608200210



##########
File path: src/target/source/source_module.cc
##########
@@ -191,17 +192,35 @@ class CSourceCrtMetadataModuleNode : public runtime::ModuleNode {
           << "}\n";
   }
 
+  void GenerateAOTDescriptor() {
+    code_ << "#include <tvm_executor.h>\n";
+    code_ << "#ifdef __cplusplus\n";
+    code_ << "extern \"C\"\n";
+    code_ << "#endif\n";
+    code_ << "TVM_DLL int32_t " << ::tvm::runtime::symbol::tvm_run_func_prefix;
+    code_ << "(void* args, void* type_code, int num_args, void* out_value, void* "
+             "out_type_code, void* resource_handle);\n";
+    code_ << "const tvm_model_t network = {\n"

Review comment:
       This applies to all global symbols though (operators, lookup_param, etc...) and indeed I have a PR ready to add "name mangling" into the picture. Basically the idea is to let the user specify a prefix and append that prefix to all the global functions defined by the compiler. I didn't submit it yet because I wanted to have an easier code review :) 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] giuseros commented on pull request #7785: [AOT] Introducing AOT in TVM

Posted by GitBox <gi...@apache.org>.

giuseros commented on pull request #7785:
URL: https://github.com/apache/tvm/pull/7785#issuecomment-818846115


   Hi @tqchen , @areusch ,
   I updated a new patch while we agree on the codegen changes [here](https://github.com/apache/tvm/pull/7785#discussion_r611859910). 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] giuseros commented on a change in pull request #7785: [AOT] Introducing AOT in TVM

Posted by GitBox <gi...@apache.org>.

giuseros commented on a change in pull request #7785:
URL: https://github.com/apache/tvm/pull/7785#discussion_r620586188



##########
File path: python/tvm/relay/build_module.py
##########
@@ -213,7 +218,7 @@ def _build_module_no_factory(mod, target=None, target_host=None, params=None, mo
     return build(mod, target, params=params, mod_name=mod_name).module
 
 
-def build(ir_mod, target=None, target_host=None, params=None, mod_name="default"):
+def build(ir_mod, target=None, target_host=None, params=None, mod_name="default", executor="graph"):

Review comment:
       In general, I am happy with an object being passed here, but I think that this deserves an RFC on its own. For the time being I would not overload the target further, and use this as a good excuse to write an "option object RFC"




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] areusch commented on a change in pull request #7785: [AOT] Introducing AOT in TVM

Posted by GitBox <gi...@apache.org>.

areusch commented on a change in pull request #7785:
URL: https://github.com/apache/tvm/pull/7785#discussion_r621356673



##########
File path: src/target/source/codegen_source_base.h
##########
@@ -155,7 +156,8 @@ runtime::Module CSourceModuleCreate(const String& code, const String& fmt,
  */
 runtime::Module CreateMetadataModule(
     const std::unordered_map<std::string, runtime::NDArray>& params, runtime::Module target_module,
-    const Array<runtime::Module>& ext_modules, Target target);
+    const Array<runtime::Module>& ext_modules, Target target,

Review comment:
       great, i agree with that logic




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] giuseros commented on pull request #7785: [AOT] Introducing AOT in TVM

Posted by GitBox <gi...@apache.org>.

giuseros commented on pull request #7785:
URL: https://github.com/apache/tvm/pull/7785#issuecomment-828639039


   Hi @areusch , 
   So, about solution 2) , it is also not affecting the logs, I think. Actually, it would be  transparent to the logs because we are not touching the target. 
   
   A major drawback of 1) is about `tvmc`. If we go for 1), the user needs to specify the executor directly in the target via `tvmc`. When we remove it from the target, also the `tvmc` command will have to change. 
   
   With 2), we can have a flag in `tvmc`, e.g., `--executor` and this will translate to whatever `relay.build` command we decide. In general, why adding something to the target if we know we are going to remove it entirely from the target (and when it is so straightforward to express as  a `relay.build` parameter)?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] giuseros commented on pull request #7785: [AOT] Introducing AOT in TVM

Posted by GitBox <gi...@apache.org>.

giuseros commented on pull request #7785:
URL: https://github.com/apache/tvm/pull/7785#issuecomment-812028482


   cc: @u99127 @mshawcroft @areusch @Mousius @MatthewARM @manupa-arm 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] jroesch commented on a change in pull request #7785: [AOT] Introducing AOT in TVM

Posted by GitBox <gi...@apache.org>.

jroesch commented on a change in pull request #7785:
URL: https://github.com/apache/tvm/pull/7785#discussion_r608091225



##########
File path: include/tvm/runtime/crt/aot/tvm_backend.h
##########
@@ -0,0 +1,104 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+/*!
+ * \file include/tvm/runtime/crt/aot/tvm_backend.h
+ * \brief Backend functions for the AOT executor
+ *
+ * These are not designed to user-facing and may change without warning
+ */
+
+#ifndef TVM_RUNTIME_CRT_AOT_TVM_BACKEND_H_

Review comment:
       I think we should try and generally reduce "ways to do something" in the code base because we already have many bifurcated mechanisms. If an interface isn't working we should redesign to encompass new use cases. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] areusch commented on a change in pull request #7785: [AOT] Introducing AOT in TVM

Posted by GitBox <gi...@apache.org>.

areusch commented on a change in pull request #7785:
URL: https://github.com/apache/tvm/pull/7785#discussion_r619868182



##########
File path: src/runtime/crt/memory/stack_allocator.c
##########
@@ -16,17 +16,22 @@
  * specific language governing permissions and limitations
  * under the License.
  */
-
 // LINT_C_FILE
-
 #include <tvm/runtime/crt/stack_allocator.h>
+#ifdef TVM_CRT_DEBUG
+#include <tvm/runtime/crt/logging.h>
+#endif
 
 void* StackMemoryManager_Allocate(tvm_workspace_t* tvm_runtime_workspace, int32_t nbytes) {
   uint32_t offset_bytes = (~nbytes + 1) & (TVM_RUNTIME_ALLOC_ALIGNMENT_BYTES - 1);
   uint8_t* current_alloc = tvm_runtime_workspace->next_alloc;
   uint8_t* next_alloc = tvm_runtime_workspace->next_alloc + nbytes + offset_bytes;
   uint8_t* workspace_end = tvm_runtime_workspace->workspace + tvm_runtime_workspace->workspace_size;
-
+#ifdef TVM_CRT_DEBUG

Review comment:
       ah i'm sorry--my previous comment was incorrect. you're indeed right that this checking mode wastes 4 bytes per tensor. I understand that we especially need to be cognizant of the alignment issue, so how about this proposal then:
   
   - the libmemory.a used in `make crttest` is built with the `crt_config.h` used to build `standalone_crt` for:
      1. the host machine
      2. tests launched from `test_crt.py`
   - in these cases, we can afford the extra cycles for a check, and I think this shouldn't affect test results since we aren't using accelerators from here.
   - so my proposal is: let's include this check by defining a flag in `src/runtime/crt/host/crt_config.h`. we can also include a commented copy of this flag in the template `crt_config.h`, but leave it off there. finally, i'm not sure we should use `TVM_CRT_DEBUG`, since that may be spammy in unit tests; but perhaps we could define another flag?
   
   let me know how this sounds to you. the main thing i'd like to achieve is that unit tests of the CRT and of the AOT that are executed in CI are run with this check enabled. 
   
   btw, we should probably discuss moving the alignment rules to a higher level than the runtime memory allocator. I think it's fine to leave it there for this PR, but seems like it should be part of our higher-level memory planning work.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] giuseros edited a comment on pull request #7785: [AOT] Introducing AOT in TVM

Posted by GitBox <gi...@apache.org>.

giuseros edited a comment on pull request #7785:
URL: https://github.com/apache/tvm/pull/7785#issuecomment-821404672


   Hi @areusch , 
   Thanks for you comments. I left some questions where I didn't get exactly what you meant (sorry :) ). The major things I will do next week is:
   * Introduce a new intrinsic to support the `cpacked_func` call 
   * Move the `aot_executor.h` in an internal folder, so that is not a public interface and update the `aot_test.mk` make file accordingly


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] tqchen commented on a change in pull request #7785: [AOT] Introducing AOT in TVM

Posted by GitBox <gi...@apache.org>.

tqchen commented on a change in pull request #7785:
URL: https://github.com/apache/tvm/pull/7785#discussion_r624496567



##########
File path: src/relay/backend/aot_executor_codegen.cc
##########
@@ -0,0 +1,672 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+/*!
+ * \file relay/backend/graph_codegen.cc
+ * \brief Graph runtime codegen
+ */
+
+#include <dmlc/any.h>

Review comment:
       this include is no longer needed




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] giuseros commented on a change in pull request #7785: [AOT] Introducing AOT in TVM

Posted by GitBox <gi...@apache.org>.

giuseros commented on a change in pull request #7785:
URL: https://github.com/apache/tvm/pull/7785#discussion_r615040089



##########
File path: python/tvm/micro/model_library_format.py
##########
@@ -126,20 +125,25 @@ def export_model_library_format(mod: graph_executor_factory.GraphExecutorFactory
 
     Parameters
     ----------
-    mod : tvm.relay.backend.graph_executor_factory.GraphExecutorFactoryModule
+    mod : tvm.relay.backend.executor_factory.ExecutorFactoryModule
         The return value of tvm.relay.build, which will be exported into Model Library Format.
     file_name : str
         Path to the .tar archive to generate.
     """
     tempdir = utils.tempdir()
+    is_aot = isinstance(mod, executor_factory.AOTExecutorFactoryModule)

Review comment:
       Can you help me understand here? I need the `is_aot` logic, to avoid generating the memory path and to select the correct runtime. Either I add a `get_factory_type` from the factory or I inspect its type here




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] manupa-arm commented on a change in pull request #7785: [AOT] Introducing AOT in TVM

Posted by GitBox <gi...@apache.org>.

manupa-arm commented on a change in pull request #7785:
URL: https://github.com/apache/tvm/pull/7785#discussion_r619408979



##########
File path: src/runtime/crt/memory/stack_allocator.c
##########
@@ -16,17 +16,22 @@
  * specific language governing permissions and limitations
  * under the License.
  */
-
 // LINT_C_FILE
-
 #include <tvm/runtime/crt/stack_allocator.h>
+#ifdef TVM_CRT_DEBUG
+#include <tvm/runtime/crt/logging.h>
+#endif
 
 void* StackMemoryManager_Allocate(tvm_workspace_t* tvm_runtime_workspace, int32_t nbytes) {
   uint32_t offset_bytes = (~nbytes + 1) & (TVM_RUNTIME_ALLOC_ALIGNMENT_BYTES - 1);
   uint8_t* current_alloc = tvm_runtime_workspace->next_alloc;
   uint8_t* next_alloc = tvm_runtime_workspace->next_alloc + nbytes + offset_bytes;
   uint8_t* workspace_end = tvm_runtime_workspace->workspace + tvm_runtime_workspace->workspace_size;
-
+#ifdef TVM_CRT_DEBUG

Review comment:
       I get the negative consequences (therefore agree that we need a check) but not sure how realistic are they to be on by default. If we really want to have them by default -- @giuseros I'd suggest we'd need a seperate buffer to hold debug info not interleaved with the data buffer.
   
   We can always compile with -D STACK_ALLOCATOR_CHECK_ENABLED to unit test, is there a concern of not being able to unit test because of its not on by default ?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] giuseros commented on a change in pull request #7785: [AOT] Introducing AOT in TVM

Posted by GitBox <gi...@apache.org>.

giuseros commented on a change in pull request #7785:
URL: https://github.com/apache/tvm/pull/7785#discussion_r620584765



##########
File path: src/target/source/source_module.cc
##########
@@ -191,17 +192,36 @@ class CSourceCrtMetadataModuleNode : public runtime::ModuleNode {
           << "}\n";
   }
 
+  void GenerateAOTDescriptor() {
+    code_ << "#include \"tvm/runtime/crt/internal/aot_executor/aot_executor.h\"\n";

Review comment:
       I am not entirely sure I follow :) . `tvm_model_t` is part of the internal interface to use AOT, and we agreed to move the executor interface to be internal for now. Also, why this would make the operator code internal? 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] manupa-arm commented on a change in pull request #7785: [AOT] Introducing AOT in TVM

Posted by GitBox <gi...@apache.org>.

manupa-arm commented on a change in pull request #7785:
URL: https://github.com/apache/tvm/pull/7785#discussion_r619359010



##########
File path: src/runtime/crt/memory/stack_allocator.c
##########
@@ -16,17 +16,22 @@
  * specific language governing permissions and limitations
  * under the License.
  */
-
 // LINT_C_FILE
-
 #include <tvm/runtime/crt/stack_allocator.h>
+#ifdef TVM_CRT_DEBUG
+#include <tvm/runtime/crt/logging.h>
+#endif
 
 void* StackMemoryManager_Allocate(tvm_workspace_t* tvm_runtime_workspace, int32_t nbytes) {
   uint32_t offset_bytes = (~nbytes + 1) & (TVM_RUNTIME_ALLOC_ALIGNMENT_BYTES - 1);
   uint8_t* current_alloc = tvm_runtime_workspace->next_alloc;
   uint8_t* next_alloc = tvm_runtime_workspace->next_alloc + nbytes + offset_bytes;
   uint8_t* workspace_end = tvm_runtime_workspace->workspace + tvm_runtime_workspace->workspace_size;
-
+#ifdef TVM_CRT_DEBUG

Review comment:
       I disagree -- user does not write the code that uses this -- its generated by tvm -- if tvm generates code that does not follow the pattern it should be fixed in the codegenerator -- the tree-scoped tir.allocates always produce the code in this manner. Therefore, I think the 4-byte overhead for each allocate should only be enabled to debug the codegenerator -- which I don't think a user would be doing -- therefore default should be off, IMO.
   
   This is aimed to be used in a deployment scenario.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] areusch commented on a change in pull request #7785: [AOT] Introducing AOT in TVM

Posted by GitBox <gi...@apache.org>.

areusch commented on a change in pull request #7785:
URL: https://github.com/apache/tvm/pull/7785#discussion_r619396774



##########
File path: src/runtime/crt/memory/stack_allocator.c
##########
@@ -16,17 +16,22 @@
  * specific language governing permissions and limitations
  * under the License.
  */
-
 // LINT_C_FILE
-
 #include <tvm/runtime/crt/stack_allocator.h>
+#ifdef TVM_CRT_DEBUG
+#include <tvm/runtime/crt/logging.h>
+#endif
 
 void* StackMemoryManager_Allocate(tvm_workspace_t* tvm_runtime_workspace, int32_t nbytes) {
   uint32_t offset_bytes = (~nbytes + 1) & (TVM_RUNTIME_ALLOC_ALIGNMENT_BYTES - 1);
   uint8_t* current_alloc = tvm_runtime_workspace->next_alloc;
   uint8_t* next_alloc = tvm_runtime_workspace->next_alloc + nbytes + offset_bytes;
   uint8_t* workspace_end = tvm_runtime_workspace->workspace + tvm_runtime_workspace->workspace_size;
-
+#ifdef TVM_CRT_DEBUG

Review comment:
       for example, perhaps you are integrating this with an application which operates a radio, but _that_ library accidentally overwrites `next_alloc`. in this case, it's preferable to have our library fail predictably as soon as it detects memory corruption. without the check here, we would hand out a corrupt memory pointer on the following allocate call.
   
   anyway--sorry, my comment here was mainly aimed at addressing @giuseros question about unit testing. this my rationale for solving the problem of how to unit test this by making it on-by-default. we should probably address productionization in a different PR, so to move forward here, we can just find another way to enable the check in unit tests--that would address my primary concern here, which is to prove that the generated code uses the APIs correctly.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] manupa-arm commented on a change in pull request #7785: [AOT] Introducing AOT in TVM

Posted by GitBox <gi...@apache.org>.

manupa-arm commented on a change in pull request #7785:
URL: https://github.com/apache/tvm/pull/7785#discussion_r619408979



##########
File path: src/runtime/crt/memory/stack_allocator.c
##########
@@ -16,17 +16,22 @@
  * specific language governing permissions and limitations
  * under the License.
  */
-
 // LINT_C_FILE
-
 #include <tvm/runtime/crt/stack_allocator.h>
+#ifdef TVM_CRT_DEBUG
+#include <tvm/runtime/crt/logging.h>
+#endif
 
 void* StackMemoryManager_Allocate(tvm_workspace_t* tvm_runtime_workspace, int32_t nbytes) {
   uint32_t offset_bytes = (~nbytes + 1) & (TVM_RUNTIME_ALLOC_ALIGNMENT_BYTES - 1);
   uint8_t* current_alloc = tvm_runtime_workspace->next_alloc;
   uint8_t* next_alloc = tvm_runtime_workspace->next_alloc + nbytes + offset_bytes;
   uint8_t* workspace_end = tvm_runtime_workspace->workspace + tvm_runtime_workspace->workspace_size;
-
+#ifdef TVM_CRT_DEBUG

Review comment:
       I assume we are not passing the tvm_workspace_t struct to the radio application, this example is accessing the next_alloc via some pointer arithmetic on a nearby data that got linked in in the .data section ? 
   
   I get the negative consequences (therefore agree that we need a check) but not sure how realistic are they to be on by default. If we really want to have them by default -- @giuseros I'd suggest we'd need a seperate buffer to hold debug info not interleaved with the data buffer.
   
   We can always compile with -D STACK_ALLOCATOR_CHECK_ENABLED to unit test, is there a concern of not being able to unit test because of its not on by default ?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] areusch commented on a change in pull request #7785: [AOT] Introducing AOT in TVM

Posted by GitBox <gi...@apache.org>.

areusch commented on a change in pull request #7785:
URL: https://github.com/apache/tvm/pull/7785#discussion_r618503865



##########
File path: src/relay/backend/build_module.cc
##########
@@ -598,6 +587,8 @@ class RelayBuildModule : public runtime::ModuleNode {
   std::unordered_map<std::string, runtime::NDArray> params_;
   /*! \brief building output */
   BuildOutput ret_;
+  /*! \brief Executor used to execute the graph */

Review comment:
       specify the possible values

##########
File path: src/target/metadata_module.h
##########
@@ -33,12 +33,15 @@
 #include <string>
 #include <unordered_map>
 
+#include "../runtime/meta_data.h"
+
 namespace tvm {
 namespace codegen {
 
 runtime::Module CreateMetadataModule(
     const std::unordered_map<std::string, runtime::NDArray>& params,
-    tvm::runtime::Module target_module, const Array<runtime::Module>& ext_modules, Target target);
+    tvm::runtime::Module target_module, const Array<runtime::Module>& ext_modules, Target target,
+    int num_inputs = 1, int num_outputs = 1);

Review comment:
       I think this disagrees w/ the function impl

##########
File path: src/runtime/meta_data.h
##########
@@ -37,6 +37,12 @@
 
 #include "runtime_base.h"
 
+/*! \brief Value used to indicate the graph executor. */
+static constexpr const char* kTvmExecutorGraph = "graph";

Review comment:
       let's place these inside a top-level namespace e.g. tvm::runtime

##########
File path: src/target/source/codegen_c_host.cc
##########
@@ -258,24 +304,13 @@ void CodeGenCHost::VisitExpr_(const CallNode* op, std::ostream& os) {  // NOLINT
     this->stream << "TVMValue " << stack_name << "[" << size << "];\n";
     os << stack_name;
   } else if (op->op.same_as(builtin::tvm_call_packed_lowered())) {
-    const StringImmNode* s = op->args[0].as<StringImmNode>();
-    ICHECK(s != nullptr) << "tvm_call_packed_lowered expects first argument as function name";
-    int64_t begin = op->args[3].as<IntImmNode>()->value;
-    int64_t end = op->args[4].as<IntImmNode>()->value;
-    int64_t num_args = end - begin;
-    ICHECK_GE(num_args, 0);
-    std::string func_name = s->value;
-    // NOTE: cannot rely on GetUnique for global decl_stream declarations
-    // because it is reset between AddFunction().
-    std::string packed_func_name = func_name + "_packed";
-    if (declared_globals_.insert(packed_func_name).second) {
-      // Still reserve the name among unique names.
-      ICHECK(GetUniqueName(packed_func_name) == packed_func_name)
-          << "Expected name " << packed_func_name << " to not be taken";
-      decl_stream << "static void* " << packed_func_name << " = NULL;\n";
-    }
-    this->PrintGetFuncFromBackend(func_name, packed_func_name);
-    this->PrintFuncCall(packed_func_name, num_args);
+    auto function_info = GetFunctionInfo(op);
+    this->PrintGetFuncFromBackend(function_info.func_name, function_info.func_name_packed);
+    this->PrintFuncCall(function_info.func_name, function_info.num_args);

Review comment:
       I think `function_info.func_name_packed`

##########
File path: src/runtime/meta_data.h
##########
@@ -32,13 +32,54 @@
 
 #include <string>
 #include <unordered_map>
+#include <utility>
 #include <vector>
 
 #include "runtime_base.h"
 
+/*! \brief Value used to indicate the graph executor. */
+static constexpr const char* kTvmExecutorGraph = "graph";
+
+/*! \brief Value used to indicate the aot executor. */
+static constexpr const char* kTvmExecutorAot = "aot";
+
 namespace tvm {
 namespace runtime {
 
+/*!
+ * \brief Structure that can be optionally used by the executor codegen
+ */
+class MetadataNode : public Object {
+ public:
+  /*! \brief number of inputs of the main function */
+  int num_inputs = 1;

Review comment:
       would prefer to place the defaults on a constructor somewhere

##########
File path: src/target/source/codegen_c_host.cc
##########
@@ -47,6 +47,7 @@ void CodeGenCHost::Init(bool output_ssa, bool emit_asserts, std::string target_s
   decl_stream << "#define TVM_EXPORTS\n";
   decl_stream << "#include \"tvm/runtime/c_runtime_api.h\"\n";
   decl_stream << "#include \"tvm/runtime/c_backend_api.h\"\n";
+

Review comment:
       needed?

##########
File path: src/runtime/crt/memory/stack_allocator.c
##########
@@ -0,0 +1,48 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+// LINT_C_FILE
+
+#include <tvm/runtime/crt/stack_allocator.h>
+
+void* StackMemoryManager_Allocate(tvm_workspace_t* tvm_runtime_workspace, int32_t nbytes) {
+  uint32_t offset_bytes = (~nbytes + 1) & (TVM_RUNTIME_ALLOC_ALIGNMENT_BYTES - 1);

Review comment:
       don't think we followed-up on these?

##########
File path: python/tvm/relay/backend/executor_factory.py
##########
@@ -60,37 +147,17 @@ def __init__(self, ir_mod, target, graph_json_str, libmod, libmod_name, params):
     def export_library(self, file_name, fcompile=None, addons=None, **kwargs):
         return self.module.export_library(file_name, fcompile, addons, **kwargs)
 
-    # Sometimes we want to get params explicitly.
-    # For example, we want to save its params value to
-    # an independent file.
+    def save_executor_config(self):

Review comment:
       seems like `get_executor_config` would be a better name since we are not actually writing to file here. alternatively, I think we want to eventually move export_model_library_format here, so could just continue using get_graph_json() conditionally there rather than introducing a new generic function here

##########
File path: src/tir/transforms/lower_tvm_builtin.cc
##########
@@ -295,7 +297,11 @@ class BuiltinLower : public StmtExprMutator {
     Array<PrimExpr> packed_args = {op->args[0], scope.stack_value, scope.stack_tcode,
                                    ConstInt32(arg_stack_begin),
                                    ConstInt32(arg_stack_begin + op->args.size() - 1)};
-    return Call(DataType::Int(32), builtin::tvm_call_packed_lowered(), packed_args);
+    if (use_string_lookup) {

Review comment:
       can probably even further reduce to:
   ```
   auto builtin_call = use_string_lookup ? builtin::tvm_call_packed_lowered() : builtin::tvm_call_cpacked_lowered();
   return Call(DataType::Int(32), builtin_call, packed_args);
   ```

##########
File path: tests/cpp/relay_build_module_test.cc
##########
@@ -119,7 +119,7 @@ TEST(Relay, BuildModule) {
   targets.Set(0, llvm_tgt);
   auto relay_mod = tvm::IRModule::FromExpr(func);
   ICHECK(relay_mod.defined()) << "Module must be defined";
-  build_f(relay_mod, targets, llvm_tgt);
+  build_f(relay_mod, targets, llvm_tgt, "graph");

Review comment:
       use constant?

##########
File path: src/relay/backend/utils.h
##########
@@ -37,12 +37,24 @@
 #include <typeinfo>
 #include <unordered_map>
 #include <unordered_set>
+#include <utility>
 #include <vector>
 
+#include "../../runtime/meta_data.h"
+
 namespace tvm {
 namespace relay {
 namespace backend {
 
+/*! \brief Lowered outputs */

Review comment:
       can you expand the comment now that this is moved up to a .h file?

##########
File path: src/runtime/crt/memory/stack_allocator.c
##########
@@ -0,0 +1,48 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+// LINT_C_FILE
+
+#include <tvm/runtime/crt/stack_allocator.h>
+
+void* StackMemoryManager_Allocate(tvm_workspace_t* tvm_runtime_workspace, int32_t nbytes) {
+  uint32_t offset_bytes = (~nbytes + 1) & (TVM_RUNTIME_ALLOC_ALIGNMENT_BYTES - 1);
+  uint8_t* current_alloc = tvm_runtime_workspace->next_alloc;
+  uint8_t* next_alloc = tvm_runtime_workspace->next_alloc + nbytes + offset_bytes;
+  uint8_t* workspace_end = tvm_runtime_workspace->workspace + tvm_runtime_workspace->workspace_size;
+
+  if (next_alloc > workspace_end) {
+    return NULL;
+  }
+
+  tvm_runtime_workspace->next_alloc = next_alloc;
+  return current_alloc;
+}
+
+tvm_crt_error_t StackMemoryManager_Free(tvm_workspace_t* tvm_runtime_workspace, void* ptr) {
+  tvm_runtime_workspace->next_alloc = ptr;

Review comment:
       don't think we followed-up on these?

##########
File path: src/relay/backend/graph_plan_memory.cc
##########
@@ -209,15 +209,18 @@ class StorageAllocator : public StorageAllocaBaseVisitor {
     for (const auto& kv : token_map_) {
       std::vector<Integer> storage_ids;
       std::vector<Integer> device_types;
+      std::vector<Integer> sid_sizes_byte;

Review comment:
       i'm not sure this is exercised anywhere in unit test. can you add one? right now we just assert on the len() but not the content

##########
File path: src/tir/transforms/lower_tvm_builtin.cc
##########
@@ -184,7 +184,9 @@ class BuiltinLower : public StmtExprMutator {
   }
   PrimExpr VisitExpr_(const CallNode* op) final {
     if (op->op.same_as(builtin::tvm_call_packed())) {
-      return MakeCallPacked(op);
+      return MakeCallPacked(op, true);

Review comment:
       can you add /* use_string_lookup */ near true?

##########
File path: python/tvm/relay/build_module.py
##########
@@ -243,10 +248,18 @@ def build(ir_mod, target=None, target_host=None, params=None, mod_name="default"
     mod_name: Optional[str]
         The module name we will build
 
+    executor: Optional[str]
+        The type of executor to be used in order to run the model:
+            - If "graph" is specified, then the graph_executor will be used
+            - If "aot" is specified, then the aot_executor will be used
+
     Returns
     -------
-    graph_json : str
-        The json string that can be accepted by graph executor.
+    internal_repr : str or tir.PrimFunc

Review comment:
       maybe better termed `executor_config` or something?

##########
File path: tests/crt/aot_memory_test.cc
##########
@@ -0,0 +1,90 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+#include <gtest/gtest.h>
+#include <tvm/runtime/crt/stack_allocator.h>
+
+/*

Review comment:
       it would be great to add a test here for out-of-order freeing




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] tqchen commented on a change in pull request #7785: [AOT] Introducing AOT in TVM

Posted by GitBox <gi...@apache.org>.

tqchen commented on a change in pull request #7785:
URL: https://github.com/apache/tvm/pull/7785#discussion_r611859910



##########
File path: src/target/source/codegen_c_host.cc
##########
@@ -40,13 +40,16 @@ namespace codegen {
 
 CodeGenCHost::CodeGenCHost() { module_name_ = GetUniqueName("__tvm_module_ctx"); }
 
-void CodeGenCHost::Init(bool output_ssa, bool emit_asserts, std::string target_str) {
+void CodeGenCHost::Init(bool output_ssa, bool emit_asserts, bool is_aot_executor,
+                        std::string target_str) {

Review comment:
       Would be great to clarify the total number of changes being needed in AOT executor. Ideally the fact of AOT executor should not impact the code generator. 
   
   Instead we should have generic arguments and document them(instead of using AOT), e.g. CRT mode where symbols are directly available:
   - A list of symbols being exposed etc.
   - Calling convention(TVMFuncCall vs direct call)
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] areusch commented on a change in pull request #7785: [AOT] Introducing AOT in TVM

Posted by GitBox <gi...@apache.org>.

areusch commented on a change in pull request #7785:
URL: https://github.com/apache/tvm/pull/7785#discussion_r620547664



##########
File path: python/tvm/relay/build_module.py
##########
@@ -213,7 +218,7 @@ def _build_module_no_factory(mod, target=None, target_host=None, params=None, mo
     return build(mod, target, params=params, mod_name=mod_name).module
 
 
-def build(ir_mod, target=None, target_host=None, params=None, mod_name="default"):
+def build(ir_mod, target=None, target_host=None, params=None, mod_name="default", executor="graph"):

Review comment:
       cc @tqchen @jroesch would be great to get feedback on API changes here.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] giuseros commented on pull request #7785: [AOT] Introducing AOT in TVM

Posted by GitBox <gi...@apache.org>.

giuseros commented on pull request #7785:
URL: https://github.com/apache/tvm/pull/7785#issuecomment-830084434


   Hi @tqchen , @areusch ,
   Any more thoughts on this? 
   
   Thanks,
   Giuseppe


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] tqchen commented on a change in pull request #7785: [AOT] Introducing AOT in TVM

Posted by GitBox <gi...@apache.org>.

tqchen commented on a change in pull request #7785:
URL: https://github.com/apache/tvm/pull/7785#discussion_r611874827



##########
File path: src/target/source/codegen_c_host.cc
##########
@@ -40,13 +40,16 @@ namespace codegen {
 
 CodeGenCHost::CodeGenCHost() { module_name_ = GetUniqueName("__tvm_module_ctx"); }
 
-void CodeGenCHost::Init(bool output_ssa, bool emit_asserts, std::string target_str) {
+void CodeGenCHost::Init(bool output_ssa, bool emit_asserts, bool is_aot_executor,
+                        std::string target_str) {

Review comment:
       Thanks @giuseros ! Yes CRT is a more generic term than aot, so it might strike a right balance
   
   cc @areusch  as well. We should explicit document the assumption being made in CRT.
   
   For example, if we really go with the direct function invocation route(instead of TVMFuncCall into a global symbol), then it might be harder for us to handle device functions that does not directly corresponds to a symbol(which is fine, but just need documenting the assumption.
   
    

##########
File path: src/target/source/codegen_c_host.cc
##########
@@ -40,13 +40,16 @@ namespace codegen {
 
 CodeGenCHost::CodeGenCHost() { module_name_ = GetUniqueName("__tvm_module_ctx"); }
 
-void CodeGenCHost::Init(bool output_ssa, bool emit_asserts, std::string target_str) {
+void CodeGenCHost::Init(bool output_ssa, bool emit_asserts, bool is_aot_executor,
+                        std::string target_str) {

Review comment:
       Thanks @giuseros ! Yes CRT is a more generic term than aot, so it might strike a right balance. 
   
   cc @areusch  as well. We should explicit document the assumption being made in CRT.
   
   For example, if we really go with the direct function invocation route(instead of TVMFuncCall into a global symbol), then it might be harder for us to handle device functions that does not directly corresponds to a symbol(which is fine, but just need documenting the assumption.
   
    




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] tqchen commented on a change in pull request #7785: [AOT] Introducing AOT in TVM

Posted by GitBox <gi...@apache.org>.

tqchen commented on a change in pull request #7785:
URL: https://github.com/apache/tvm/pull/7785#discussion_r622143194



##########
File path: include/tvm/tir/builtin.h
##########
@@ -343,6 +343,16 @@ TVM_DLL const Op& tvm_stack_make_array();
  */
 TVM_DLL const Op& tvm_call_packed();
 
+/*!
+ * \brief See pesudo code
+ *
+ *  int tvm_call_packed(fname, TVMValue* args) {
+ *     (*fname)(args, type_code_of(args), len(args));
+ *     return 0;
+ *  }
+ */
+TVM_DLL const Op& tvm_call_cpacked();

Review comment:
       Yah I meant docstring. A separate update PR is also good




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] giuseros commented on a change in pull request #7785: [AOT] Introducing AOT in TVM

Posted by GitBox <gi...@apache.org>.

giuseros commented on a change in pull request #7785:
URL: https://github.com/apache/tvm/pull/7785#discussion_r608186592



##########
File path: src/relay/backend/aot_codegen.cc
##########
@@ -0,0 +1,675 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+/*!
+ * \file relay/backend/graph_codegen.cc
+ * \brief Graph runtime codegen
+ */
+
+#include <dmlc/any.h>
+#include <tvm/ir/module.h>
+#include <tvm/relay/expr_functor.h>
+#include <tvm/runtime/device_api.h>
+#include <tvm/tir/builtin.h>
+#include <tvm/tir/expr.h>
+#include <tvm/tir/stmt.h>
+
+#include <algorithm>
+#include <list>
+#include <string>
+#include <vector>
+
+#include "../../runtime/meta_data.h"
+#include "compile_engine.h"
+#include "utils.h"
+
+namespace tvm {
+namespace relay {
+namespace backend {
+
+using IntegerArray = Array<Integer>;
+using ShapeVector = std::vector<std::vector<int64_t>>;
+using GraphAttrs = std::unordered_map<std::string, dmlc::any>;
+using TargetsMap = std::unordered_map<int, Target>;
+
+/*! \brief Lowered outputs */
+struct AOTLoweredOutput {
+  std::string graph_tir;
+  Map<String, IRModule> lowered_funcs;
+  Array<tvm::runtime::Module> external_mods;
+  std::unordered_map<std::string, std::pair<int, const tvm::runtime::NDArray>> params;
+  runtime::AOTMetadata aot_metadata;
+};
+
+class AotReturnSidVisitor : public ExprVisitor {
+ public:
+  explicit AotReturnSidVisitor(Map<Expr, Array<IntegerArray>> storage_device_map)
+      : storage_device_map_{storage_device_map}, return_sid_{-1} {}
+
+  IntegerArray FindReturnSid(Function func) {
+    VisitExpr(func->body);
+    return return_sid_;
+  }
+
+ protected:
+  void AssignReturnSid(Expr e) {
+    auto iter = storage_device_map_.find(e);
+    if (iter != storage_device_map_.end()) {
+      return_sid_ = (*iter).second[0];
+    }
+  }
+
+  void VisitExpr_(const ConstantNode* cn) override {
+    ExprVisitor::VisitExpr_(cn);
+    AssignReturnSid(GetRef<Expr>(cn));
+  }
+
+  void VisitExpr_(const VarNode* vn) override {
+    ExprVisitor::VisitExpr_(vn);
+    AssignReturnSid(GetRef<Expr>(vn));
+  }
+
+  void VisitExpr_(const CallNode* cn) override {
+    ExprVisitor::VisitExpr_(cn);
+    AssignReturnSid(GetRef<Expr>(cn));
+  }
+
+  void VisitExpr_(const LetNode* op) override { VisitExpr(op->body); }
+
+  void VisitExpr_(const TupleNode* tn) override {
+    ExprVisitor::VisitExpr_(tn);
+    AssignReturnSid(GetRef<Expr>(tn));
+  }
+
+ private:
+  Map<Expr, Array<IntegerArray>> storage_device_map_;
+  IntegerArray return_sid_;
+};
+
+using TIRNetwork = tvm::Array<tir::Stmt>;
+
+/*! \brief Code generator for graph runtime */
+class AOTCodegen : public ExprVisitor {
+ protected:
+  /*!
+   * \brief Utility function to allocate a DLTensor or TVMValue
+   * \param  type the type of allocation
+   * \param num the number of variable to allocate on the stack
+   * \return PrimExpr representing the allocated object
+   */
+  PrimExpr StackAlloca(std::string type, size_t num) {
+    Array<PrimExpr> args = {tir::StringImm(type), ConstInt32(num)};
+    return tir::Call(DataType::Handle(), tir::builtin::tvm_stack_alloca(), args);
+  }
+
+  /*!
+   * \brief Utility function to allocate memory for storage identifiers
+   * \param  memory_size_byte size in bytes of the allocation
+   * \return PrimExpr representing the allocated memory
+   */
+  PrimExpr AllocateBackendMemory(int memory_size_byte) {
+    // TODO(giuseros): use tir::Allocate instead of TVMBackendAllocWorkspace
+    // to enable unified memory planning
+    static const Op& op = Op::Get("tir.TVMBackendAllocWorkspace");
+    return tvm::tir::Call(DataType::Handle(), op, {1, 0, memory_size_byte, 2, 8});
+  }
+
+  /*!
+   * \brief Utility function to convert a concrete integer to a PrimExpr.
+   * \param num the number to convert
+   * \return PrimExpr representing num
+   */
+  inline PrimExpr ConstInt32(size_t num) {
+    ICHECK_LE(num, std::numeric_limits<int>::max());
+    return tir::make_const(DataType::Int(32), static_cast<int>(num));
+  }
+
+  /*!
+   * \brief Return a vector of variables that represents the sids for the given Relay Expr
+   */
+  std::vector<tir::Var> pack_sid(Expr expr) {
+    Array<IntegerArray> sids = storage_device_map_[expr];
+    std::vector<tir::Var> sid_vars;
+
+    // Note that an expression can have multiple sids associated with it
+    // e.g., returning multiple values from a function
+    for (const auto& sid : sids[0]) {
+      // Determine if an sid is an output buffer
+      int sid_int = static_cast<int>((sid.as<IntImmNode>())->value);
+      auto output_iter = std::find(return_sid_.begin(), return_sid_.end(), sid_int);
+      if (output_iter != return_sid_.end()) {
+        int output_index = std::distance(return_sid_.begin(), output_iter);
+        sid_vars.push_back(main_signature_[input_vars_.size() + output_index]);
+        continue;
+      }
+      // Pack the sid inside the TVMValue
+      auto sid_array = te::Var(make_string("sid_", sid, "_value"), DataType::Handle());
+      auto sid_value = sids_table_[sid];
+      tvm::PrimExpr set_tensor =
+          tvm::tir::Call(DataType::Handle(), tvm::tir::builtin::tvm_struct_set(),
+                         {sid_array, 0, tir::builtin::kArrData, sid_value});
+      stmts_.push_back(tir::LetStmt(sid_array, StackAlloca("array", 1), tir::Evaluate(set_tensor)));
+      sid_vars.push_back(sid_array);
+    }
+    return sid_vars;
+  }
+
+  /*!
+   * \brief Utility function to return a parameter associated with an expression
+   * \param expr Relay Expression assicated with the parameter
+   * \return Variable that represents the DLTensor associated with the parameters
+   */
+  tir::Var pack_param(Expr expr) {
+    // TODO(giuseros): Using call_extern to call into lookup_linked_param. This is because the
+    // builtin::ret is not supported yet in the c target. Once return is supported we can use
+    // tvm_call_packed_lowered().
+    int param_sid = param_storage_ids_[reverse_params_lookup_[expr]];
+    auto lookup_linked_param_fn = tir::StringImm(::tvm::runtime::symbol::tvm_lookup_linked_param);
+    auto param_array = te::Var(make_string("param_", param_sid, "_array"), DataType::Handle());
+
+    // Compose the lookup_call using a local stack
+    Array<tir::Stmt> lookup_call;
+    auto param_var = te::Var(make_string("param_", param_sid, "_value"), DataType::Handle());
+    auto ret_var = te::Var("ret_value", DataType::Handle());
+    auto ret_code = te::Var("ret_value", DataType::Handle());
+
+    lookup_call.push_back(tir::Evaluate(
+        tvm::tir::Call(DataType::Handle(), tvm::tir::builtin::tvm_struct_set(),
+                       {param_var, 0, tir::builtin::kTVMValueContent, ConstInt32(param_sid)})));
+    lookup_call.push_back(tir::Evaluate(
+        tvm::tir::Call(DataType::Handle(), tir::builtin::call_extern(),
+                       {lookup_linked_param_fn, param_var, 0, 0, ret_var, ret_code, 0})));
+    auto ret_var_handle = tvm::tir::Call(DataType::Handle(), tvm::tir::builtin::tvm_struct_get(),
+                                         {ret_var, 0, tir::builtin::kTVMValueContent});
+
+    // Set the param to the value returned by lookup_call
+    tvm::PrimExpr set_param_array =
+        tvm::tir::Call(DataType::Handle(), tvm::tir::builtin::tvm_struct_set(),
+                       {param_array, 0, tir::builtin::kArrData, ret_var_handle});
+    lookup_call.push_back(tir::Evaluate(set_param_array));
+
+    tir::Stmt lookup_body = tir::SeqStmt(lookup_call);
+
+    // Allocate the DLTensors on the stack
+    lookup_body = tir::LetStmt(param_var, StackAlloca("arg_value", 1), lookup_body);
+    lookup_body = tir::LetStmt(ret_var, StackAlloca("arg_value", 1), lookup_body);
+    lookup_body = tir::LetStmt(ret_code, StackAlloca("arg_value", 1), lookup_body);
+    lookup_body = tir::LetStmt(param_array, StackAlloca("arg_value", 1), lookup_body);
+    stmts_.push_back(lookup_body);
+    return param_array;
+  }
+
+  /*!
+   * brief Given an expression return the variable(s) associated with that expression
+   */
+  std::vector<te::Var> find_expr(Expr arg) {
+    auto input_iter = std::find(input_vars_.begin(), input_vars_.end(), arg);
+    if (input_iter != input_vars_.end()) {
+      // Input variable
+      int main_index = std::distance(input_vars_.begin(), input_iter);
+      return {main_signature_[main_index]};
+    } else if (reverse_params_lookup_.find(arg) != reverse_params_lookup_.end()) {
+      // Parameter of the network
+      return {pack_param(arg)};
+    } else {
+      // Storage identifier (i.e., intermediate memory)
+      return pack_sid(arg);
+    }
+  }
+
+  /*!
+   * brief Call a function with a given name
+   */
+  void func_call(Call call, std::string func_name) {
+    tvm::Array<PrimExpr> args{tvm::tir::StringImm(func_name)};
+    std::vector<tir::Stmt> func_call_stmts;
+
+    // Pack the inputs
+    for (Expr arg : call->args) {
+      auto var_arg = find_expr(arg);
+      args.push_back(var_arg[0]);
+    }
+
+    auto ret_expr = Downcast<Expr>(call);
+
+    // Pack the return(s) value. A call node can produce multiple outputs
+    for (const auto& var : pack_sid(ret_expr)) {
+      args.push_back(var);
+    }
+
+    // Use tvm_call_packed to execute the function
+    func_call_stmts.push_back(tir::Evaluate(
+        tvm::tir::Call(DataType::Int(32), tvm::tir::builtin::tvm_call_packed(), args)));
+    tir::Stmt body = tir::SeqStmt(func_call_stmts);
+    stmts_.push_back(body);
+  }
+
+  /*!
+   * brief Copy a variable to the output. This function is mainly used in edge cases
+   * when we want to return an input or a parameter.
+   */
+  void copy_to_output(te::Var out, te::Var in, size_t size) {
+    auto retval_get = tvm::tir::Call(DataType::Handle(), tvm::tir::builtin::tvm_struct_get(),
+                                     {in, 0, tir::builtin::kArrData});
+
+    // Define intermediate DLTensor to load/store the data
+    auto tmp0 = te::Var("tmp0", DataType::Handle());
+    auto tmp1 = te::Var("tmp1", DataType::Handle());
+    te::Var loop_idx("i", DataType::Int(32));
+    auto retval_i = tir::Load(DataType::UInt(8), tmp0, loop_idx, tir::const_true());
+    auto tostore = tvm::tir::Call(DataType::Handle(), tvm::tir::builtin::tvm_struct_get(),
+                                  {out, 0, tir::builtin::kArrData});
+
+    // Copy the variable from the input to the output
+    tir::Stmt copy = tir::For(
+        loop_idx, 0, ConstInt32(size), tir::ForKind::kSerial,
+        tir::Store(tmp1, tir::Let(tmp0, retval_get, retval_i), loop_idx, tir::const_true()));
+    stmts_.push_back(tir::LetStmt(tmp1, tostore, copy));
+  }
+
+  /*!
+   * Utility function to string together different arguments
+   */
+  template <typename... Args>
+  std::string make_string(Args const&... args) {
+    std::ostringstream ss;
+    using List = int[];
+    (void)List{0, ((void)(ss << args), 0)...};
+
+    return ss.str();
+  }
+
+  void VisitExpr_(const CallNode* op) override {
+    // Descend the call tree
+    for (auto arg : op->args) {
+      VisitExpr(arg);
+    }
+
+    Expr expr = GetRef<Expr>(op);
+    Function func;
+    if (op->op.as<OpNode>()) {
+      LOG(FATAL) << "Operators should be transformed away; try applying"
+                 << "the fuse_ops transformation to the expression.";
+    } else if (op->op.as<GlobalVarNode>()) {
+      LOG(FATAL) << "Not implemented";
+    } else if (op->op.as<FunctionNode>()) {
+      func = GetRef<Function>(op->op.as<FunctionNode>());
+    } else {
+      LOG(FATAL) << "TVM runtime does not support calls to " << op->op->GetTypeKey();
+    }
+    if (!func->HasNonzeroAttr(attr::kPrimitive)) {
+      LOG(FATAL) << "TVM only support calls to primitive functions "
+                 << "(i.e functions composed of fusable operator invocations)";
+    }
+
+    auto pf0 = GetPackedFunc("relay.backend._make_CCacheKey");
+    auto pf1 = GetPackedFunc("relay.backend._CompileEngineLower");
+    Target target;
+    // Handle external function
+    if (func->GetAttr<String>(attr::kCompiler).defined()) {
+      target = Target("ext_dev");
+      CCacheKey key = (*pf0)(func, target);
+      CachedFunc ext_func = (*pf1)(compile_engine_, key);
+      ICHECK(ext_func.defined()) << "External function is not defined.";
+      UpdateConstants(func, &params_);
+
+      // Generate the TIR function call
+      func_call(GetRef<Call>(op), ext_func->func_name);
+    }
+
+    ICHECK_GE(storage_device_map_.count(expr), 0);
+    auto& device_type = storage_device_map_[expr][1];
+    auto call_dev_type = device_type[0]->value;
+    // Normal Relay Function
+    if (targets_.size() == 1) {
+      // homogeneous execution.
+      const auto& it = targets_.begin();
+      target = (*it).second;
+    } else {
+      // heterogeneous execution.
+      std::string call_dev_name;
+      if (call_dev_type == 0) {
+        call_dev_name = "llvm";
+      } else {
+        call_dev_name = runtime::DeviceName(call_dev_type);
+      }
+      if (targets_.count(call_dev_type) == 0) {
+        LOG(FATAL) << "No target is provided for device " << call_dev_name;
+      }
+      target = targets_[call_dev_type];
+    }
+    CCacheKey key = (*pf0)(func, target);
+    CachedFunc lowered_func = (*pf1)(compile_engine_, key);
+    if (!lowered_funcs_.count(target->str())) {
+      lowered_funcs_[target->str()] = IRModule(Map<GlobalVar, BaseFunc>({}));
+    }
+    lowered_funcs_[target->str()]->Update(lowered_func->funcs);
+
+    // Generate the TIR function call
+    func_call(GetRef<Call>(op), lowered_func->func_name);
+  }
+
+  void VisitExpr_(const VarNode* op) override {
+    Expr expr = GetRef<Expr>(op);
+
+    // If the Var node is an output node we need to copy the content of the variable to the output
+    // A Var node can only produce a single output
+    Array<IntegerArray> sids = storage_device_map_[expr];
+    auto output_iter = std::find(return_sid_.begin(), return_sid_.end(),
+                                 static_cast<int>((sids[0][0].as<IntImmNode>())->value));
+    if (output_iter != return_sid_.end()) {
+      int output_index = std::distance(return_sid_.begin(), output_iter);
+      auto var_expr = find_expr(expr);
+      copy_to_output(main_signature_[input_vars_.size() + output_index], var_expr[0], sids[2][0]);
+    }
+  }
+
+  void VisitExpr_(const ConstantNode* op) override {
+    Expr expr = GetRef<Expr>(op);
+    size_t index = params_.size();
+    std::string name = "p" + std::to_string(index);
+
+    param_storage_ids_[name] = storage_device_map_[expr][0][0]->value;
+    params_[name] = op->data;
+    reverse_params_lookup_.Set(expr, name);
+
+    // If the Constant node is an output node we need to copy the content of the parameter to the
+    // output A Var node can only produce a single output
+    Array<IntegerArray> sids = storage_device_map_[expr];
+    auto output_iter = std::find(return_sid_.begin(), return_sid_.end(),
+                                 static_cast<int>((sids[0][0].as<IntImmNode>())->value));
+    if (output_iter != return_sid_.end()) {
+      int output_index = std::distance(return_sid_.begin(), output_iter);
+      copy_to_output(main_signature_[input_vars_.size() + output_index], pack_param(expr),
+                     sids[2][0]);
+    }
+  }
+
+  void VisitExpr_(const TupleNode* op) override {
+    for (auto field : op->fields) {
+      VisitExpr(field);
+    }
+  }
+
+  void VisitExpr_(const LetNode* op) override {
+    // TODO(giuseros): support Let nodes in AOT
+    throw std::invalid_argument("Let not yet implemented in AOT");
+  }
+  void VisitExpr_(const TupleGetItemNode* op) override { VisitExpr(op->tuple); }
+  void VisitExpr_(const OpNode* op) override {
+    throw std::runtime_error("can not compile op in non-eta expanded form");
+  }
+  void VisitExpr_(const GlobalVarNode* op) override { throw std::runtime_error(""); }
+  void VisitExpr_(const IfNode* op) override { throw std::invalid_argument("if not supported"); }
+  void VisitExpr_(const FunctionNode* op) override {
+    ICHECK(op->GetAttr<String>(attr::kCompiler).defined())
+        << "Only functions supported by custom codegen";
+  }
+  void VisitExpr_(const RefCreateNode* op) override {
+    throw std::invalid_argument("reference not supported");
+  }
+  void VisitExpr_(const RefReadNode* op) override {
+    throw std::invalid_argument("reference not supported");
+  }
+  void VisitExpr_(const RefWriteNode* op) override {
+    throw std::invalid_argument("reference not supported");
+  }
+  void VisitExpr_(const ConstructorNode* op) override {
+    throw std::invalid_argument("ADT constructor case not yet implemented");
+  }
+  void VisitExpr_(const MatchNode* op) override {
+    throw std::invalid_argument("match case not yet implemented");
+  }
+
+  // Create the main PrimFunc to execute the graph
+  tir::PrimFunc CreateMainFunc(unsigned int relay_params) {
+    tir::Stmt body = tir::SeqStmt(stmts_);
+
+    // Allocate the sids
+    std::unordered_map<int, bool> allocated;
+
+    for (auto kv : storage_device_map_) {
+      // Only allocate sids that are needed
+      const bool is_input =
+          (std::find(input_vars_.begin(), input_vars_.end(), kv.first) != input_vars_.end());
+      const bool is_param = (reverse_params_lookup_.find(kv.first) != reverse_params_lookup_.end());
+      if (is_input || is_param) {
+        continue;
+      }
+
+      for (unsigned int i = 0; i < kv.second[0].size(); i++) {
+        int size = kv.second[2][i];
+        int sid = static_cast<int>((kv.second[0][i].as<IntImmNode>())->value);
+
+        if (std::find(return_sid_.begin(), return_sid_.end(), sid) != return_sid_.end()) {
+          continue;
+        }
+
+        if (!allocated[sid]) {
+          body = tir::LetStmt(sids_table_[sid], AllocateBackendMemory(size), body);

Review comment:
       That would be nice. Can we do that? As in, can we add a tir::let statement to an entire PrimFunc?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] areusch commented on a change in pull request #7785: [AOT] Introducing AOT in TVM

Posted by GitBox <gi...@apache.org>.

areusch commented on a change in pull request #7785:
URL: https://github.com/apache/tvm/pull/7785#discussion_r621367983



##########
File path: src/target/source/source_module.cc
##########
@@ -191,17 +192,36 @@ class CSourceCrtMetadataModuleNode : public runtime::ModuleNode {
           << "}\n";
   }
 
+  void GenerateAOTDescriptor() {
+    code_ << "#include \"tvm/runtime/crt/internal/aot_executor/aot_executor.h\"\n";

Review comment:
       ah, right--we wanted to defer making a public API to a follow-on PR. that's still true, so perhaps we should keep this here for this PR. my thinking was that depending on a header file in `tvm/runtime/crt/internal` makes this code internal. however, when compiling µTVM project, the model operator is compiled separately from CRT, so the current compilation flow treats operator code as external right now. 
   
   with that said, let's ignore this for now--as you said, we're keeping the AOT executor interface internal now, and we don't enforce this internal-external split e.g. by creating a separate include dir at build time now anyhow. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] manupa-arm commented on a change in pull request #7785: [AOT] Introducing AOT in TVM

Posted by GitBox <gi...@apache.org>.

manupa-arm commented on a change in pull request #7785:
URL: https://github.com/apache/tvm/pull/7785#discussion_r620080936



##########
File path: src/runtime/crt/memory/stack_allocator.c
##########
@@ -16,17 +16,22 @@
  * specific language governing permissions and limitations
  * under the License.
  */
-
 // LINT_C_FILE
-
 #include <tvm/runtime/crt/stack_allocator.h>
+#ifdef TVM_CRT_DEBUG
+#include <tvm/runtime/crt/logging.h>
+#endif
 
 void* StackMemoryManager_Allocate(tvm_workspace_t* tvm_runtime_workspace, int32_t nbytes) {
   uint32_t offset_bytes = (~nbytes + 1) & (TVM_RUNTIME_ALLOC_ALIGNMENT_BYTES - 1);
   uint8_t* current_alloc = tvm_runtime_workspace->next_alloc;
   uint8_t* next_alloc = tvm_runtime_workspace->next_alloc + nbytes + offset_bytes;
   uint8_t* workspace_end = tvm_runtime_workspace->workspace + tvm_runtime_workspace->workspace_size;
-
+#ifdef TVM_CRT_DEBUG

Review comment:
       Cool! I agree that alignment should be considered in a higher-level memory planning work. :). 
   However, as shown here, its trivial to support a global alignment (i.e. addresses produced by the allocator is aligned to a global granularity) might be beneficial in the mean time. 
   
   I think @giuseros already proposed a new flag.
   
   @giuseros WDYT?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] giuseros commented on a change in pull request #7785: [AOT] Introducing AOT in TVM

Posted by GitBox <gi...@apache.org>.

giuseros commented on a change in pull request #7785:
URL: https://github.com/apache/tvm/pull/7785#discussion_r608201217



##########
File path: include/tvm/runtime/crt/aot/tvm_backend.h
##########
@@ -0,0 +1,104 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+/*!
+ * \file include/tvm/runtime/crt/aot/tvm_backend.h
+ * \brief Backend functions for the AOT executor
+ *
+ * These are not designed to user-facing and may change without warning
+ */
+
+#ifndef TVM_RUNTIME_CRT_AOT_TVM_BACKEND_H_
+#define TVM_RUNTIME_CRT_AOT_TVM_BACKEND_H_
+
+#include <stddef.h>
+#include <stdint.h>
+
+#include "tvm_error.h"
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+/*! Memory alignment for allocator */
+#ifndef TVM_RUNTIME_ALLOC_ALIGNMENT
+#define TVM_RUNTIME_ALLOC_ALIGNMENT 16

Review comment:
       I think it has the same meaning, but it is exposed in a more C-friendly way 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] giuseros commented on a change in pull request #7785: [AOT] Introducing AOT in TVM

Posted by GitBox <gi...@apache.org>.

giuseros commented on a change in pull request #7785:
URL: https://github.com/apache/tvm/pull/7785#discussion_r610705074



##########
File path: include/tvm/runtime/crt/aot/tvm_executor.h
##########
@@ -0,0 +1,97 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+/*!
+ * \file include/tvm/runtime/crt/aot/tvm_executor.h
+ * \brief TVM Executor for the Ahead-of-Time Runtime
+ *
+ * AOT models are described by the TVM model descriptor format
+ * which can be passed to tvm_runtime_run. These descriptors will be
+ * generated by the AOT compilation process. This can optionally be
+ * augmented with platform specific context to be passed to the TVM
+ * operators.
+ *
+ * Example:
+ * extern tvm_model_t my_network;
+ * int main() {
+ *    void* data = get_data();
+ *    void* output[4] = {0, 0, 0, 0};
+ *    void* inputs = {data};
+ *    void* outputs = {output};
+ *    tvm_context_t my_context = {
+ *      .driver = ...;
+ *    };
+ *    tvm_runtime_run(
+ *      &my_network,
+ *      inputs,
+ *      outputs
+ *      &my_context
+ *    );
+ *    return 0;
+ * }
+ */
+
+#ifndef TVM_RUNTIME_CRT_AOT_TVM_EXECUTOR_H_
+#define TVM_RUNTIME_CRT_AOT_TVM_EXECUTOR_H_
+
+#include <stdint.h>
+
+#include "tvm_backend.h"
+#include "tvm_error.h"
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+/*!

Review comment:
       So would be fine if we only leave the `tvm_context`? This would give us an easy entry point for RTOS driver specific things we are working on in the next future




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] areusch commented on a change in pull request #7785: [AOT] Introducing AOT in TVM

Posted by GitBox <gi...@apache.org>.

areusch commented on a change in pull request #7785:
URL: https://github.com/apache/tvm/pull/7785#discussion_r615046675



##########
File path: src/runtime/crt/memory/stack_allocator.c
##########
@@ -0,0 +1,48 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+// LINT_C_FILE
+
+#include <tvm/runtime/crt/stack_allocator.h>
+
+void* StackMemoryManager_Allocate(tvm_workspace_t* tvm_runtime_workspace, int32_t nbytes) {
+  uint32_t offset_bytes = (~nbytes + 1) & (TVM_RUNTIME_ALLOC_ALIGNMENT_BYTES - 1);

Review comment:
       one way to do my proposal below this is to store a "tag" word either before or after the region being returned here, then validate the tag in Free(). For instance:
   
   ```
   *((uint32_t*) next_alloc) = 0xabcd1234 ^ nbytes;
   next_alloc += 4;
   ```
   
   then in free, check that the tag word is correct:
   ```
   uint32_t tag = *(((uint32_t*) ptr) - 1);
   CHECK_EQ(tag, (((uintptr_t) tvm_runtime_workspace->next_alloc) - ((uintptr_t) ptr)) ^ 0xabcd1234, "tag did not match");
   ```

##########
File path: tests/python/relay/test_backend_graph_executor.py
##########
@@ -133,7 +133,7 @@ def test_plan_memory():
     storage_ids = set()
     device_types = set()
     for k, v in smap.items():
-        assert len(v) == 2
+        assert len(v) == 3

Review comment:
       here you're expanding the length assertion, but below, there are some rudimentary asserts on v[0] and v[1]. can you also add some assert on v[2]?

##########
File path: src/runtime/crt/memory/stack_allocator.c
##########
@@ -0,0 +1,48 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+// LINT_C_FILE
+
+#include <tvm/runtime/crt/stack_allocator.h>
+
+void* StackMemoryManager_Allocate(tvm_workspace_t* tvm_runtime_workspace, int32_t nbytes) {
+  uint32_t offset_bytes = (~nbytes + 1) & (TVM_RUNTIME_ALLOC_ALIGNMENT_BYTES - 1);
+  uint8_t* current_alloc = tvm_runtime_workspace->next_alloc;
+  uint8_t* next_alloc = tvm_runtime_workspace->next_alloc + nbytes + offset_bytes;
+  uint8_t* workspace_end = tvm_runtime_workspace->workspace + tvm_runtime_workspace->workspace_size;
+
+  if (next_alloc > workspace_end) {
+    return NULL;
+  }
+
+  tvm_runtime_workspace->next_alloc = next_alloc;
+  return current_alloc;
+}
+
+tvm_crt_error_t StackMemoryManager_Free(tvm_workspace_t* tvm_runtime_workspace, void* ptr) {
+  tvm_runtime_workspace->next_alloc = ptr;

Review comment:
       I think this API depends on an e.g. FIFO ordering in calls. Specifically:
   ```
   Allocate(100, &a);
   Allocate(200, &b);
   Free(&b);
   Free(&a);
   ```
   
   is correct, while:
   
   ```
   Allocate(100, &a);
   Allocate(200, &b);
   Free(&a);
   ```
   
   is not. It's pretty easy to screw this up, so can we add a sanity-check here to assert that indeed we are calling Free in the correct order?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] tqchen commented on a change in pull request #7785: [AOT] Introducing AOT in TVM

Posted by GitBox <gi...@apache.org>.

tqchen commented on a change in pull request #7785:
URL: https://github.com/apache/tvm/pull/7785#discussion_r608167553



##########
File path: src/tir/transforms/lower_tvm_builtin.cc
##########
@@ -247,7 +247,7 @@ class BuiltinLower : public StmtExprMutator {
     Array<PrimExpr> packed_args = {op->args[0], stack_value_, stack_tcode_,
                                    ConstInt32(arg_stack_begin),
                                    ConstInt32(arg_stack_begin + op->args.size() - 1)};
-    return Call(DataType::Int(32), builtin::tvm_call_packed_lowered(), packed_args);
+    return Call(op->dtype, builtin::tvm_call_packed_lowered(), packed_args);

Review comment:
       the return value of tvm_call_packed_lowered is int32, and the return value is passed through the return value argument instead of the return type, so it is supposed to be i32




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] giuseros commented on a change in pull request #7785: [AOT] Introducing AOT in TVM

Posted by GitBox <gi...@apache.org>.

giuseros commented on a change in pull request #7785:
URL: https://github.com/apache/tvm/pull/7785#discussion_r621447913



##########
File path: include/tvm/tir/builtin.h
##########
@@ -343,6 +343,16 @@ TVM_DLL const Op& tvm_stack_make_array();
  */
 TVM_DLL const Op& tvm_call_packed();
 
+/*!
+ * \brief See pesudo code
+ *
+ *  int tvm_call_packed(fname, TVMValue* args) {
+ *     (*fname)(args, type_code_of(args), len(args));
+ *     return 0;
+ *  }
+ */
+TVM_DLL const Op& tvm_call_cpacked();

Review comment:
       Hi @tqchen, do you mean updating the doc string? If the overall PR is fine, I would stick with this for now and update the comments in #7932 . If this last PR gets merged before AOT, then I would update the comment. What do you think?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] giuseros commented on a change in pull request #7785: [AOT] Introducing AOT in TVM

Posted by GitBox <gi...@apache.org>.

giuseros commented on a change in pull request #7785:
URL: https://github.com/apache/tvm/pull/7785#discussion_r620580137



##########
File path: src/target/source/codegen_source_base.h
##########
@@ -155,7 +156,8 @@ runtime::Module CSourceModuleCreate(const String& code, const String& fmt,
  */
 runtime::Module CreateMetadataModule(
     const std::unordered_map<std::string, runtime::NDArray>& params, runtime::Module target_module,
-    const Array<runtime::Module>& ext_modules, Target target);
+    const Array<runtime::Module>& ext_modules, Target target,

Review comment:
       The reason why I gave it a default was that it's also invoked from `compiler.cc` for the vm executor. I can change `compiler.cc` to pass the default or leave the the default in the header. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] manupa-arm commented on a change in pull request #7785: [AOT] Introducing AOT in TVM

Posted by GitBox <gi...@apache.org>.

manupa-arm commented on a change in pull request #7785:
URL: https://github.com/apache/tvm/pull/7785#discussion_r622139782



##########
File path: src/runtime/crt/memory/stack_allocator.c
##########
@@ -0,0 +1,65 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+// LINT_C_FILE
+#include <tvm/runtime/crt/stack_allocator.h>
+#ifdef TVM_CRT_STACK_ALLOCATOR_ENABLE_FIFO_CHECK
+#include <tvm/runtime/crt/logging.h>
+#endif
+
+void* StackMemoryManager_Allocate(tvm_workspace_t* tvm_runtime_workspace, int32_t nbytes) {
+  // reserve bytes at the end of the allocation such that
+  // next_alloc % TVM_RUNTIME_ALLOC_ALIGNMENT_BYTES == 0.
+  uint32_t offset_bytes =
+      (TVM_RUNTIME_ALLOC_ALIGNMENT_BYTES - nbytes) & (TVM_RUNTIME_ALLOC_ALIGNMENT_BYTES - 1);
+  uint8_t* current_alloc = tvm_runtime_workspace->next_alloc;
+  uint8_t* next_alloc = tvm_runtime_workspace->next_alloc + nbytes + offset_bytes;
+  uint8_t* workspace_end = tvm_runtime_workspace->workspace + tvm_runtime_workspace->workspace_size;
+#ifdef TVM_CRT_STACK_ALLOCATOR_ENABLE_FIFO_CHECK
+  if (next_alloc + STACK_ALLOCATOR_TAG_SIZE_BYTES > workspace_end) {
+    return NULL;

Review comment:
       Need to return correct error type as expected by TVMPlatformAllocate 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] areusch commented on a change in pull request #7785: [AOT] Introducing AOT in TVM

Posted by GitBox <gi...@apache.org>.

areusch commented on a change in pull request #7785:
URL: https://github.com/apache/tvm/pull/7785#discussion_r618529957



##########
File path: src/relay/backend/build_module.cc
##########
@@ -473,23 +517,25 @@ class RelayBuildModule : public runtime::ModuleNode {
 
     // Relay IRModule -> IRModule optimizations.
     relay_module = Optimize(relay_module, targets_, params);
+
     // Get the updated function.
     auto func = Downcast<Function>(relay_module->Lookup("main"));
 
     // Generate code for the updated function.
-    graph_codegen_ = std::unique_ptr<GraphCodegen>(new GraphCodegen());
-    graph_codegen_->Init(nullptr, targets_);
-    graph_codegen_->Codegen(func);
-
-    ret_.graph_json = graph_codegen_->GetJSON();
-    ret_.params = graph_codegen_->GetParams();
+    const String executor_str =
+        target_host->GetAttr<String>("executor").value_or(kTvmExecutorGraph);

Review comment:
       i agree this isn't too invasive now, the main question is whether `executor` should become a top-level `tvm.relay.build` parameter or whether e.g. `runtime_config` or a similarly named struct should be there instead. I feel like at least these parameters may not belong with target (and there are probably more, these just come to mind):
   - executor
   - runtime
   - link-params (this one is arguable, but I don't think link-params fully describes the param placement)
   
   I think it would be great to have another PR which addresses the question of: how do we model runtime config in TVM targets, and how should changes there affect autotuning schedule keys? The last bit is the contentious part. For this PR, perhaps we could take the approach I did with runtime: define it as a Target attribute with no default, and elsewhere make the default when unspecified kTvmExecutorGraph. This way, in the usual non-aot case, we don't affect tuning logs. wdyt?

##########
File path: src/relay/backend/build_module.cc
##########
@@ -473,23 +517,25 @@ class RelayBuildModule : public runtime::ModuleNode {
 
     // Relay IRModule -> IRModule optimizations.
     relay_module = Optimize(relay_module, targets_, params);
+
     // Get the updated function.
     auto func = Downcast<Function>(relay_module->Lookup("main"));
 
     // Generate code for the updated function.
-    graph_codegen_ = std::unique_ptr<GraphCodegen>(new GraphCodegen());
-    graph_codegen_->Init(nullptr, targets_);
-    graph_codegen_->Codegen(func);
-
-    ret_.graph_json = graph_codegen_->GetJSON();
-    ret_.params = graph_codegen_->GetParams();
+    const String executor_str =
+        target_host->GetAttr<String>("executor").value_or(kTvmExecutorGraph);

Review comment:
       cc @jroesch @tqchen 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] giuseros commented on a change in pull request #7785: [AOT] Introducing AOT in TVM

Posted by GitBox <gi...@apache.org>.

giuseros commented on a change in pull request #7785:
URL: https://github.com/apache/tvm/pull/7785#discussion_r615040232



##########
File path: src/runtime/crt/memory/stack_allocator.c
##########
@@ -0,0 +1,48 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+// LINT_C_FILE
+
+#include <tvm/runtime/crt/stack_allocator.h>
+
+void* StackMemoryManager_Allocate(tvm_workspace_t* tvm_runtime_workspace, int32_t nbytes) {
+  uint32_t offset_bytes = (~nbytes + 1) & (TVM_RUNTIME_ALLOC_ALIGNMENT_BYTES - 1);
+  uint8_t* current_alloc = tvm_runtime_workspace->next_alloc;
+  uint8_t* next_alloc = tvm_runtime_workspace->next_alloc + nbytes + offset_bytes;
+  uint8_t* workspace_end = tvm_runtime_workspace->workspace + tvm_runtime_workspace->workspace_size;
+
+  if (next_alloc > workspace_end) {
+    return NULL;
+  }
+
+  tvm_runtime_workspace->next_alloc = next_alloc;
+  return current_alloc;
+}
+
+tvm_crt_error_t StackMemoryManager_Free(tvm_workspace_t* tvm_runtime_workspace, void* ptr) {
+  tvm_runtime_workspace->next_alloc = ptr;

Review comment:
       Could you clarify? 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] manupa-arm commented on a change in pull request #7785: [AOT] Introducing AOT in TVM

Posted by GitBox <gi...@apache.org>.

manupa-arm commented on a change in pull request #7785:
URL: https://github.com/apache/tvm/pull/7785#discussion_r616604456



##########
File path: src/relay/backend/build_module.cc
##########
@@ -473,23 +492,35 @@ class RelayBuildModule : public runtime::ModuleNode {
 
     // Relay IRModule -> IRModule optimizations.
     relay_module = Optimize(relay_module, targets_, params);
+
     // Get the updated function.
     auto func = Downcast<Function>(relay_module->Lookup("main"));
 
     // Generate code for the updated function.
-    graph_codegen_ = std::unique_ptr<GraphCodegen>(new GraphCodegen());
-    graph_codegen_->Init(nullptr, targets_);
-    graph_codegen_->Codegen(func);
+    const String executor_str =
+        target_host->GetAttr<String>("executor").value_or(kTvmExecutorGraph);
+    if (executor_str == kTvmExecutorGraph) {

Review comment:
       nit : how about using factory method here : e.g., ExecutorCodegen::MakeCodegen(const String& exec_str)

##########
File path: tests/python/relay/aot/test_crt_aot.py
##########
@@ -0,0 +1,247 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+import tflite
+import os
+import io
+import struct
+import numpy as np
+import pathlib
+import shutil
+import subprocess
+import tempfile
+import tarfile
+import pytest
+
+import tvm
+from tvm import relay
+from tvm.relay import transform
+from tvm.relay.op.contrib import get_pattern_table
+from tvm.contrib import utils
+from tvm.relay.backend import compile_engine
+from tvm.contrib import utils
+from tvm.contrib import graph_executor
+from tvm.micro import export_model_library_format
+from tvm.relay import testing
+
+from aot_test_utils import *
+
+
+def test_conv_with_params():
+    RELAY_MODEL = """
+#[version = "0.0.5"]
+def @main(%data : Tensor[(1, 3, 64, 64), uint8], %weight : Tensor[(8, 3, 5, 5), int8]) {
+    %1 = nn.conv2d(
+         %data,
+         %weight,
+         padding=[2, 2],
+         channels=8,
+         kernel_size=[5, 5],
+         data_layout="NCHW",
+         kernel_layout="OIHW",
+         out_dtype="int32");
+  %1
+}
+"""
+    mod = tvm.parser.fromtext(RELAY_MODEL)
+    main_func = mod["main"]
+    shape_dict = {p.name_hint: p.checked_type.concrete_shape for p in main_func.params}
+    type_dict = {p.name_hint: p.checked_type.dtype for p in main_func.params}
+
+    weight_data = np.ones(shape_dict["weight"]).astype(type_dict["weight"])
+    input_data = np.ones(shape_dict["data"]).astype(type_dict["data"])
+
+    params = {"weight": weight_data}
+    inputs = {"data": input_data}
+    output_list = generate_ref_data(mod, inputs, params)
+
+    input_list = [input_data]
+    compile_and_run(mod, input_list, output_list, params)
+
+
+def test_add_with_params():
+    x = relay.var("x", shape=(1, 10))
+    y = relay.var("y", shape=(1, 10))
+    z = relay.add(x, y)
+    func = relay.Function([x, y], z)
+
+    x_in = np.ones((1, 10)).astype("float32")
+    y_in = np.random.uniform(size=(1, 10)).astype("float32")
+
+    params = {"x": x_in}
+    inputs = {"y": y_in}
+    output_list = generate_ref_data(func, inputs, params)
+
+    input_list = [y_in]
+    compile_and_run(func, input_list, output_list, params)
+
+
+def test_conv2d():
+    """Test a subgraph with a single conv2d operator."""
+
+    def conv2d_direct():
+        dtype = "float32"
+        ishape = (1, 32, 14, 14)
+        w1shape = (32, 32, 3, 3)
+
+        data0 = relay.var("data", shape=ishape, dtype=dtype)
+        weight0 = relay.var("weight", shape=w1shape, dtype=dtype)
+        out = relay.nn.conv2d(data0, weight0, kernel_size=(3, 3), padding=(1, 1))
+        main_f = relay.Function([data0, weight0], out)
+        mod = tvm.IRModule()
+        mod["main"] = main_f
+        mod = transform.InferType()(mod)
+
+        i_data = np.random.uniform(0, 1, ishape).astype(dtype)
+        w1_data = np.random.uniform(0, 1, w1shape).astype(dtype)
+
+        return mod, {"data": i_data, "weight": w1_data}, (1, 32, 14, 14)
+
+    def group_conv2d():
+        dtype = "float32"
+        ishape = (1, 32, 14, 14)
+        w2shape = (32, 1, 3, 3)
+
+        data0 = relay.var("data", shape=(ishape), dtype=dtype)
+        weight0 = relay.var("weight", shape=(w2shape), dtype=dtype)
+        out = relay.nn.conv2d(data0, weight0, kernel_size=(3, 3), padding=(1, 1), groups=32)
+        main_f = relay.Function([data0, weight0], out)
+        mod = tvm.IRModule()
+        mod["main"] = main_f
+        mod = transform.InferType()(mod)
+
+        i_data = np.random.uniform(0, 1, ishape).astype(dtype)
+        w_data = np.random.uniform(0, 1, w2shape).astype(dtype)
+
+        return mod, {"data": i_data, "weight": w_data}, (1, 32, 14, 14)
+
+    for mod, inputs, out_shape in [conv2d_direct(), group_conv2d()]:
+        output_list = generate_ref_data(mod, inputs)
+        input_list = [inputs["data"], inputs["weight"]]
+        compile_and_run(mod, input_list, output_list)
+
+
+def test_concatenate():
+    dtype = "float32"
+    x = relay.var("x", shape=(10, 5), dtype=dtype)
+    y = relay.var("y", shape=(10, 5), dtype=dtype)
+    t = relay.var("z", shape=(), dtype=dtype)
+    z = relay.concatenate((x, y), axis=1)
+    z = relay.add(z, t)
+    # Check result.
+    func = relay.Function([x, y, t], z)
+    x_data = np.random.rand(10, 5).astype(dtype)
+    y_data = np.random.rand(10, 5).astype(dtype)
+    t_data = np.random.uniform(size=()).astype(dtype)
+    inputs = {"x": x_data, "y": y_data, "z": t_data}
+
+    output_list = generate_ref_data(func, inputs)
+    input_list = [inputs["x"], inputs["y"], inputs["z"]]
+    compile_and_run(func, input_list, output_list)
+
+
+def test_nested_tuples():
+    x = relay.var("x", shape=(10,))
+    x1 = x + relay.const(1.0)
+    x2 = x1 + relay.const(1.0)
+    x3 = x2 + relay.const(1.0)
+    x4 = x3 + relay.const(1.0)
+    out = relay.Tuple([x1, relay.Tuple([relay.Tuple([x2, x3]), x4])])
+    func = relay.Function([x], out)
+
+    x_data = np.random.uniform(size=(10,)).astype(np.float32)
+    inputs = {"x": x_data}
+    output_list = generate_ref_data(func, inputs)
+    input_list = [x_data]
+    compile_and_run(func, input_list, output_list)
+
+
+def test_tuple_getitem():
+    func = relay.Function([], relay.TupleGetItem(relay.Tuple([relay.const(1), relay.const(2)]), 0))
+    output_list = generate_ref_data(func, {})
+    input_list = []
+    compile_and_run(func, input_list, output_list)
+
+
+def test_id():
+    x = relay.var("x", "float32")
+    ident = relay.Function([x], x)
+    one = np.array(1.0, "float32")
+    inputs = {"x": one}
+    output_list = generate_ref_data(ident, inputs)
+    input_list = [one]
+    compile_and_run(ident, input_list, output_list)
+
+
+def test_add_const():
+    two = relay.add(relay.const(1), relay.const(1))
+    func = relay.Function([], two)
+    output_list = generate_ref_data(func, {})
+    input_list = []
+    compile_and_run(func, input_list, output_list)
+
+
+def test_mul_param():
+    x = relay.var("x", shape=(10, 10))
+    y = relay.var("y", shape=(1, 10))
+    func = relay.Function([x, y], relay.multiply(x, y))
+    x_data = np.random.rand(10, 10).astype("float32")
+    y_data = np.random.rand(1, 10).astype("float32")
+    inputs = {"x": x_data, "y": y_data}
+    output_list = generate_ref_data(func, inputs)
+    input_list = [inputs["x"], inputs["y"]]
+    compile_and_run(func, input_list, output_list)
+
+
+def test_subtract():
+    i = relay.var("i", shape=[], dtype="int32")
+    sub = relay.subtract(i, relay.const(1, dtype="int32"))
+    func = relay.Function([i], sub, ret_type=relay.TensorType([], "int32"))
+    i_data = np.array(1, dtype="int32")
+    inputs = {"i": i_data}
+    output_list = generate_ref_data(func, inputs)
+    input_list = [inputs["i"]]
+    compile_and_run(func, input_list, output_list)
+
+
+def test_tuple_output():
+    x = relay.var("x", shape=(6, 9))
+    y = relay.split(x, 3).astuple()
+    a = relay.TupleGetItem(y, 0)
+    b = relay.TupleGetItem(y, 1)
+    c = relay.TupleGetItem(y, 2)
+    out = relay.Tuple([a, b])
+    func = relay.Function([x], out)
+    x_data = np.random.rand(6, 9).astype("float32")
+    inputs = {"x": x_data}
+    output_list = generate_ref_data(func, inputs)
+    input_list = [inputs["x"]]
+    compile_and_run(func, input_list, output_list)
+
+
+def test_mobilenet():
+    mod, params = testing.mobilenet.get_workload(batch_size=1)
+    data_shape = [int(x) for x in mod["main"].checked_type.arg_types[0].shape]
+    data = np.random.uniform(size=data_shape).astype("float32")
+    inputs = {"data": data}
+    output_list = generate_ref_data(mod, inputs, params)
+    input_list = [inputs["data"]]
+    compile_and_run(mod, input_list, output_list, params)
+

Review comment:
       Can we have the BYOC infra test : test_byoc_utvm that is in  tests/micro/zephyr/test_zephyr.py ?

##########
File path: src/relay/backend/build_module.cc
##########
@@ -473,23 +492,35 @@ class RelayBuildModule : public runtime::ModuleNode {
 
     // Relay IRModule -> IRModule optimizations.
     relay_module = Optimize(relay_module, targets_, params);
+
     // Get the updated function.
     auto func = Downcast<Function>(relay_module->Lookup("main"));
 
     // Generate code for the updated function.
-    graph_codegen_ = std::unique_ptr<GraphCodegen>(new GraphCodegen());
-    graph_codegen_->Init(nullptr, targets_);
-    graph_codegen_->Codegen(func);
+    const String executor_str =
+        target_host->GetAttr<String>("executor").value_or(kTvmExecutorGraph);
+    if (executor_str == kTvmExecutorGraph) {
+      executor_codegen_ = std::unique_ptr<ExecutorCodegen>(new GraphCodegen());
+    } else {
+      executor_codegen_ = std::unique_ptr<ExecutorCodegen>(new AOTCodegen());
+    }
 
-    ret_.graph_json = graph_codegen_->GetJSON();
-    ret_.params = graph_codegen_->GetParams();
+    executor_codegen_->Init(nullptr, targets_);
+    executor_codegen_->Codegen(func);
+
+    if (executor_str == kTvmExecutorGraph) {
+      ret_.graph_json = reinterpret_cast<GraphCodegen*>(executor_codegen_.get())->GetJSON();

Review comment:
       nit : I think we can have a virtual method ExecutorCodegen::UpdateOutput(const BuildOutput& ret) that implements the update as required by specialized classes.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] giuseros commented on a change in pull request #7785: [AOT] Introducing AOT in TVM

Posted by GitBox <gi...@apache.org>.

giuseros commented on a change in pull request #7785:
URL: https://github.com/apache/tvm/pull/7785#discussion_r608168664



##########
File path: python/tvm/micro/model_library_format.py
##########
@@ -86,10 +86,13 @@ def _build_memory_map(graph_json):
     list :
         A list with one entry per storage id describing that memory.
     """
-    graph = json.loads(graph_json)
+    memory_map = []
+    if graph_str.startswith("primfn"):

Review comment:
       So, ok.The main point here is that I was trying to unify a "graph" representation. When we do:




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] areusch merged pull request #7785: [AOT] Introducing AOT in TVM

Posted by GitBox <gi...@apache.org>.

areusch merged pull request #7785:
URL: https://github.com/apache/tvm/pull/7785


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] areusch commented on a change in pull request #7785: [AOT] Introducing AOT in TVM

Posted by GitBox <gi...@apache.org>.

areusch commented on a change in pull request #7785:
URL: https://github.com/apache/tvm/pull/7785#discussion_r619491548



##########
File path: src/runtime/crt/memory/stack_allocator.c
##########
@@ -16,17 +16,22 @@
  * specific language governing permissions and limitations
  * under the License.
  */
-
 // LINT_C_FILE
-
 #include <tvm/runtime/crt/stack_allocator.h>
+#ifdef TVM_CRT_DEBUG
+#include <tvm/runtime/crt/logging.h>
+#endif
 
 void* StackMemoryManager_Allocate(tvm_workspace_t* tvm_runtime_workspace, int32_t nbytes) {
   uint32_t offset_bytes = (~nbytes + 1) & (TVM_RUNTIME_ALLOC_ALIGNMENT_BYTES - 1);
   uint8_t* current_alloc = tvm_runtime_workspace->next_alloc;
   uint8_t* next_alloc = tvm_runtime_workspace->next_alloc + nbytes + offset_bytes;
   uint8_t* workspace_end = tvm_runtime_workspace->workspace + tvm_runtime_workspace->workspace_size;
-
+#ifdef TVM_CRT_DEBUG

Review comment:
       > this example is accessing the next_alloc via some pointer arithmetic on a nearby data that got linked in in the .data section?
   
   yeah exactly. common cause is passing in a bad pointer to some other data structure which then causes a library to corrupt the data. on x86, e.g. dereferencing a nullptr (e.g. trying to access a struct member with struct ptr==0) causes a segfault; on bare metal, you just read whatever program data happens to be there (often .text or from the SoC ROM).
   
   one thing with the current impl is that currently `tag` is just placed in the extra space given by `TVM_CRT_ALIGNMENT_BYTES`, so because that's 16, there is always an extra word of memory. the overhead here is a few cycles of compute. I don't see it as particularly large considering we are doing a lot more compute when e.g. doing a conv2d.
   
   > is there a concern of not being able to unit test because of its not on by default ?
   yeah exactly--but it would be fine to just turn it on for unit tests too. If you want to do that, perhaps we should create a `crt_config.h` for unit tests and then change the include path used for unit tests in StandaloneCrt.cmake and in the python code which builds the AOT makefile.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] giuseros commented on a change in pull request #7785: [AOT] Introducing AOT in TVM

Posted by GitBox <gi...@apache.org>.

giuseros commented on a change in pull request #7785:
URL: https://github.com/apache/tvm/pull/7785#discussion_r619379316



##########
File path: src/runtime/crt/memory/stack_allocator.c
##########
@@ -16,17 +16,22 @@
  * specific language governing permissions and limitations
  * under the License.
  */
-
 // LINT_C_FILE
-
 #include <tvm/runtime/crt/stack_allocator.h>
+#ifdef TVM_CRT_DEBUG
+#include <tvm/runtime/crt/logging.h>
+#endif
 
 void* StackMemoryManager_Allocate(tvm_workspace_t* tvm_runtime_workspace, int32_t nbytes) {
   uint32_t offset_bytes = (~nbytes + 1) & (TVM_RUNTIME_ALLOC_ALIGNMENT_BYTES - 1);
   uint8_t* current_alloc = tvm_runtime_workspace->next_alloc;
   uint8_t* next_alloc = tvm_runtime_workspace->next_alloc + nbytes + offset_bytes;
   uint8_t* workspace_end = tvm_runtime_workspace->workspace + tvm_runtime_workspace->workspace_size;
-
+#ifdef TVM_CRT_DEBUG

Review comment:
       So there are two main issues here:
   * We are using 4 bytes per block to check memory. Given the memory constraints on some devices, it might be too expensive. 
   * More importantly, we are loosing the block alignment. There are some microcontrollers that would not allow non-aligned memory access. 
   For those two reasons, I would disable it by default and only enable for specific reasons. I can use a different MACRO (e.g., STACK_ALLOCATOR_CHECK_ENABLED) that people can use to enable this feature, so it would not be only for debug. What do you think? Also adding @manupa-arm and @Mousius to the discussion




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] areusch commented on a change in pull request #7785: [AOT] Introducing AOT in TVM

Posted by GitBox <gi...@apache.org>.

areusch commented on a change in pull request #7785:
URL: https://github.com/apache/tvm/pull/7785#discussion_r614264731



##########
File path: src/target/source/codegen_c_host.cc
##########
@@ -40,13 +40,16 @@ namespace codegen {
 
 CodeGenCHost::CodeGenCHost() { module_name_ = GetUniqueName("__tvm_module_ctx"); }
 
-void CodeGenCHost::Init(bool output_ssa, bool emit_asserts, std::string target_str) {
+void CodeGenCHost::Init(bool output_ssa, bool emit_asserts, bool is_aot_executor,
+                        std::string target_str) {

Review comment:
       >> it might be harder for us to handle device functions that does not directly corresponds to a symbol
   >Could you give an example for such function?
   
   I think @tqchen is referring to e.g. functions which offload compute to devices. This would be e.g. functions produced by a non-`llvm` or `c` backend, so BYOC for µTVM or CUDA, Vulkan, etc. on traditional OS.
   
   The thing is, as I mentioned in the discuss forum post, these functions are actually launched from the `target_host` e.g. CPU by calling a closure returned from `GetFunction`. On traditional OS, this `GetFunction` + closure serves to both program the accelerator and launch the compute (split is unspecified between the two, but typically `GetFunction` does the programming).
   
   In micro-land, I don't recommend we adopt any sort of closures, and I also think that it's likely that the "programming" part will happen as part of compilation or as part of the Module initialization. So I'd recommend we assume that we'll move the "accelerator programming" step  _at least_ into some Module-init function. Or it may even be handled outside of the executor. Given that, `GetFunction` is essentially a no-op, and it makes sense to me to presume that the PackedFunc impl will launch and block on accelerated compute in these cases. As we move to handling asynchronous execution, we may need to modify this interface (but that will likely amount to a "launch" and a "sync" function, so not too different).
   
   >> Given the potential need to have a different path on the embedded setting(to avoid string lookup and use symbol), we should clarify and document what is the CRT standard so that the new code generator can refer to
   >So, about this, for me the "runtime" of the graph is a triplet of:
   >
   >Calling convention (registry, dynamic or plain)
   >Backend functions (platform specific vs defined in TVM)
   >Graph execution (aot vs graph)
   
   Runtime has been pretty overloaded in TVM, so given the recent PR to rename to GraphExecutor, I prefer we just use "runtime" to refer to `c_runtime_api.h` and `c_backend_api.h` implementation and dependencies of those. GraphExecutor is built on top of that, as should AOTExecutor be, so I think "runtime" should exclude Executor.
   
   I agree it's going to be an impossibly-long PR if we try to fit in the full general AOT integration, and don't think we should do that. I suggest that here, we just implement the logic needed to produce the top-level TIR function and test it, and then consider things like changes to runtime memory management in follow-on PRs. This might be going a bit too far backwards, but e.g. if it were to simplify things, we could even switch to emitting `tir.call_packed_lowered` in AOT codegen and then implement the direct call as a follow-on PR.
   
   >So, I am not sure we can come up in day1 with a full general aot integration. I would say :
   >
   >Let's stick with the current AOT definition for now (which is for sure more general that we had in mind at the >beginning :) )
   >I will remove the is_aot from the codegen, but I will use a more generic use_plain_calling_convention
   >Let's document the exact definition of CRT and AOT and let's try to generalize later
   
   I think this makes sense--let's clarify a couple points:
    - definition of AOT: just the piece that generates the top-level TIR function to serve as an Executor
    - definition of CRT: `c_runtime_api.h` and `c_backend_api.h` implementation and dependencies of those
   
   Now as to what we should put in this PR to move forward--the rest of this comment is my proposal, feel free to debate.
   
   I'd suggest we just limit this PR to the top-level TIR function which serves as part of the Executor implementation (e.g. `Executor#Run`). That would mean deferring:
    - Memory management changes--just use `TVMBackendAllocWorkspace` in this PR even for intermediate graph tensors
    - firmware-facing API--just enough to make test cases for the top-level TIR function run
    - calling convention: can use what you guys have put together here, or can even emit `tir.call_packed_lowered` nodes if that's easier. I think they are about equivalent, aside that `call_packed_lowered` uses `TVMBackendGetFuncFromEnv`.
   
   Near-future directions (e.g. stuff implemented here I am not saying we should drop; just move to next PR):
   - Memory management: let's discuss this soon and implement the change as a GraphPlanMemory change. That should get us to a solution that works for all executors and is extensible should users wish to implement their own allocation scheme. I do not view the existing page-based allocator as particularly fit for production standalone inference.
   - PackedFunc invocation from other PackedFunc: let's discuss this with an eye to both BYOC and `c`/`llvm` modules, and come up with an efficient solution that makes sense for a pure-C world. This mostly doesn't impact the firmware-facing API, so let's initially discuss this separately.
   - firmware-facing API: Let's also discuss this, either as part of your initial RFC or as a new thread. 
   
   Would this make sense to you guys as a way forward? Happy to discuss other options too. I do think it'd be better to merge a part of this PR and move forward with the others, rather than create one giant PR and discuss all of the pieces at once. At the same time, I do realize we should ensure we're working towards a vision that makes sense for microcontrollers, which involves many pieces given how different the deployment environment is from traditional OS.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] giuseros commented on a change in pull request #7785: [AOT] Introducing AOT in TVM

Posted by GitBox <gi...@apache.org>.

giuseros commented on a change in pull request #7785:
URL: https://github.com/apache/tvm/pull/7785#discussion_r611646784



##########
File path: src/relay/backend/aot_codegen.cc
##########
@@ -0,0 +1,675 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+/*!
+ * \file relay/backend/graph_codegen.cc
+ * \brief Graph runtime codegen
+ */
+
+#include <dmlc/any.h>
+#include <tvm/ir/module.h>
+#include <tvm/relay/expr_functor.h>
+#include <tvm/runtime/device_api.h>
+#include <tvm/tir/builtin.h>
+#include <tvm/tir/expr.h>
+#include <tvm/tir/stmt.h>
+
+#include <algorithm>
+#include <list>
+#include <string>
+#include <vector>
+
+#include "../../runtime/meta_data.h"
+#include "compile_engine.h"
+#include "utils.h"
+
+namespace tvm {
+namespace relay {
+namespace backend {
+
+using IntegerArray = Array<Integer>;
+using ShapeVector = std::vector<std::vector<int64_t>>;
+using GraphAttrs = std::unordered_map<std::string, dmlc::any>;
+using TargetsMap = std::unordered_map<int, Target>;
+
+/*! \brief Lowered outputs */
+struct AOTLoweredOutput {
+  std::string graph_tir;
+  Map<String, IRModule> lowered_funcs;
+  Array<tvm::runtime::Module> external_mods;
+  std::unordered_map<std::string, std::pair<int, const tvm::runtime::NDArray>> params;
+  runtime::AOTMetadata aot_metadata;
+};
+
+class AotReturnSidVisitor : public ExprVisitor {
+ public:
+  explicit AotReturnSidVisitor(Map<Expr, Array<IntegerArray>> storage_device_map)
+      : storage_device_map_{storage_device_map}, return_sid_{-1} {}
+
+  IntegerArray FindReturnSid(Function func) {
+    VisitExpr(func->body);
+    return return_sid_;
+  }
+
+ protected:
+  void AssignReturnSid(Expr e) {
+    auto iter = storage_device_map_.find(e);
+    if (iter != storage_device_map_.end()) {
+      return_sid_ = (*iter).second[0];
+    }
+  }
+
+  void VisitExpr_(const ConstantNode* cn) override {
+    ExprVisitor::VisitExpr_(cn);
+    AssignReturnSid(GetRef<Expr>(cn));
+  }
+
+  void VisitExpr_(const VarNode* vn) override {
+    ExprVisitor::VisitExpr_(vn);
+    AssignReturnSid(GetRef<Expr>(vn));
+  }
+
+  void VisitExpr_(const CallNode* cn) override {
+    ExprVisitor::VisitExpr_(cn);
+    AssignReturnSid(GetRef<Expr>(cn));
+  }
+
+  void VisitExpr_(const LetNode* op) override { VisitExpr(op->body); }
+
+  void VisitExpr_(const TupleNode* tn) override {
+    ExprVisitor::VisitExpr_(tn);
+    AssignReturnSid(GetRef<Expr>(tn));
+  }
+
+ private:
+  Map<Expr, Array<IntegerArray>> storage_device_map_;
+  IntegerArray return_sid_;
+};
+
+using TIRNetwork = tvm::Array<tir::Stmt>;
+
+/*! \brief Code generator for graph runtime */
+class AOTCodegen : public ExprVisitor {
+ protected:
+  /*!
+   * \brief Utility function to allocate a DLTensor or TVMValue
+   * \param  type the type of allocation
+   * \param num the number of variable to allocate on the stack
+   * \return PrimExpr representing the allocated object
+   */
+  PrimExpr StackAlloca(std::string type, size_t num) {
+    Array<PrimExpr> args = {tir::StringImm(type), ConstInt32(num)};
+    return tir::Call(DataType::Handle(), tir::builtin::tvm_stack_alloca(), args);
+  }
+
+  /*!
+   * \brief Utility function to allocate memory for storage identifiers
+   * \param  memory_size_byte size in bytes of the allocation
+   * \return PrimExpr representing the allocated memory
+   */
+  PrimExpr AllocateBackendMemory(int memory_size_byte) {
+    // TODO(giuseros): use tir::Allocate instead of TVMBackendAllocWorkspace
+    // to enable unified memory planning
+    static const Op& op = Op::Get("tir.TVMBackendAllocWorkspace");
+    return tvm::tir::Call(DataType::Handle(), op, {1, 0, memory_size_byte, 2, 8});
+  }
+
+  /*!
+   * \brief Utility function to convert a concrete integer to a PrimExpr.
+   * \param num the number to convert
+   * \return PrimExpr representing num
+   */
+  inline PrimExpr ConstInt32(size_t num) {
+    ICHECK_LE(num, std::numeric_limits<int>::max());
+    return tir::make_const(DataType::Int(32), static_cast<int>(num));
+  }
+
+  /*!
+   * \brief Return a vector of variables that represents the sids for the given Relay Expr
+   */
+  std::vector<tir::Var> pack_sid(Expr expr) {
+    Array<IntegerArray> sids = storage_device_map_[expr];
+    std::vector<tir::Var> sid_vars;
+
+    // Note that an expression can have multiple sids associated with it
+    // e.g., returning multiple values from a function
+    for (const auto& sid : sids[0]) {
+      // Determine if an sid is an output buffer
+      int sid_int = static_cast<int>((sid.as<IntImmNode>())->value);
+      auto output_iter = std::find(return_sid_.begin(), return_sid_.end(), sid_int);
+      if (output_iter != return_sid_.end()) {
+        int output_index = std::distance(return_sid_.begin(), output_iter);
+        sid_vars.push_back(main_signature_[input_vars_.size() + output_index]);
+        continue;
+      }
+      // Pack the sid inside the TVMValue
+      auto sid_array = te::Var(make_string("sid_", sid, "_value"), DataType::Handle());
+      auto sid_value = sids_table_[sid];
+      tvm::PrimExpr set_tensor =
+          tvm::tir::Call(DataType::Handle(), tvm::tir::builtin::tvm_struct_set(),
+                         {sid_array, 0, tir::builtin::kArrData, sid_value});
+      stmts_.push_back(tir::LetStmt(sid_array, StackAlloca("array", 1), tir::Evaluate(set_tensor)));
+      sid_vars.push_back(sid_array);
+    }
+    return sid_vars;
+  }
+
+  /*!
+   * \brief Utility function to return a parameter associated with an expression
+   * \param expr Relay Expression assicated with the parameter
+   * \return Variable that represents the DLTensor associated with the parameters
+   */
+  tir::Var pack_param(Expr expr) {
+    // TODO(giuseros): Using call_extern to call into lookup_linked_param. This is because the
+    // builtin::ret is not supported yet in the c target. Once return is supported we can use
+    // tvm_call_packed_lowered().
+    int param_sid = param_storage_ids_[reverse_params_lookup_[expr]];
+    auto lookup_linked_param_fn = tir::StringImm(::tvm::runtime::symbol::tvm_lookup_linked_param);
+    auto param_array = te::Var(make_string("param_", param_sid, "_array"), DataType::Handle());
+
+    // Compose the lookup_call using a local stack
+    Array<tir::Stmt> lookup_call;
+    auto param_var = te::Var(make_string("param_", param_sid, "_value"), DataType::Handle());
+    auto ret_var = te::Var("ret_value", DataType::Handle());
+    auto ret_code = te::Var("ret_value", DataType::Handle());
+
+    lookup_call.push_back(tir::Evaluate(
+        tvm::tir::Call(DataType::Handle(), tvm::tir::builtin::tvm_struct_set(),
+                       {param_var, 0, tir::builtin::kTVMValueContent, ConstInt32(param_sid)})));
+    lookup_call.push_back(tir::Evaluate(
+        tvm::tir::Call(DataType::Handle(), tir::builtin::call_extern(),
+                       {lookup_linked_param_fn, param_var, 0, 0, ret_var, ret_code, 0})));
+    auto ret_var_handle = tvm::tir::Call(DataType::Handle(), tvm::tir::builtin::tvm_struct_get(),
+                                         {ret_var, 0, tir::builtin::kTVMValueContent});
+
+    // Set the param to the value returned by lookup_call
+    tvm::PrimExpr set_param_array =
+        tvm::tir::Call(DataType::Handle(), tvm::tir::builtin::tvm_struct_set(),
+                       {param_array, 0, tir::builtin::kArrData, ret_var_handle});
+    lookup_call.push_back(tir::Evaluate(set_param_array));
+
+    tir::Stmt lookup_body = tir::SeqStmt(lookup_call);
+
+    // Allocate the DLTensors on the stack
+    lookup_body = tir::LetStmt(param_var, StackAlloca("arg_value", 1), lookup_body);
+    lookup_body = tir::LetStmt(ret_var, StackAlloca("arg_value", 1), lookup_body);
+    lookup_body = tir::LetStmt(ret_code, StackAlloca("arg_value", 1), lookup_body);
+    lookup_body = tir::LetStmt(param_array, StackAlloca("arg_value", 1), lookup_body);
+    stmts_.push_back(lookup_body);
+    return param_array;
+  }
+
+  /*!
+   * brief Given an expression return the variable(s) associated with that expression
+   */
+  std::vector<te::Var> find_expr(Expr arg) {
+    auto input_iter = std::find(input_vars_.begin(), input_vars_.end(), arg);
+    if (input_iter != input_vars_.end()) {
+      // Input variable
+      int main_index = std::distance(input_vars_.begin(), input_iter);
+      return {main_signature_[main_index]};
+    } else if (reverse_params_lookup_.find(arg) != reverse_params_lookup_.end()) {
+      // Parameter of the network
+      return {pack_param(arg)};
+    } else {
+      // Storage identifier (i.e., intermediate memory)
+      return pack_sid(arg);
+    }
+  }
+
+  /*!
+   * brief Call a function with a given name
+   */
+  void func_call(Call call, std::string func_name) {
+    tvm::Array<PrimExpr> args{tvm::tir::StringImm(func_name)};
+    std::vector<tir::Stmt> func_call_stmts;
+
+    // Pack the inputs
+    for (Expr arg : call->args) {
+      auto var_arg = find_expr(arg);
+      args.push_back(var_arg[0]);
+    }
+
+    auto ret_expr = Downcast<Expr>(call);
+
+    // Pack the return(s) value. A call node can produce multiple outputs
+    for (const auto& var : pack_sid(ret_expr)) {
+      args.push_back(var);
+    }
+
+    // Use tvm_call_packed to execute the function
+    func_call_stmts.push_back(tir::Evaluate(
+        tvm::tir::Call(DataType::Int(32), tvm::tir::builtin::tvm_call_packed(), args)));
+    tir::Stmt body = tir::SeqStmt(func_call_stmts);
+    stmts_.push_back(body);
+  }
+
+  /*!
+   * brief Copy a variable to the output. This function is mainly used in edge cases

Review comment:
       Well, I was only copying existing tests from the graph executor tests. My only point is that implementing something more efficient for an edge case might not be worth it. If you think otherwise, I can easily add a TODO about this. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] areusch commented on a change in pull request #7785: [AOT] Introducing AOT in TVM

Posted by GitBox <gi...@apache.org>.

areusch commented on a change in pull request #7785:
URL: https://github.com/apache/tvm/pull/7785#discussion_r611884529



##########
File path: src/runtime/crt/common/aot_backend_api.c
##########
@@ -0,0 +1,59 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+// LINT_C_FILE
+
+#include <assert.h>
+#include <inttypes.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+#include <tvm/runtime/c_backend_api.h>
+#include <tvm/runtime/crt/error_codes.h>
+#include <tvm/runtime/crt/logging.h>
+#include <tvm/runtime/crt/platform.h>
+
+#include "crt_config.h"
+
+void* TVMBackendAllocWorkspace(int device_type, int device_id, uint64_t nbytes, int dtype_code_hint,

Review comment:
       okay so--sorry, my previous comment was a bit ambiguous. my previous comment is what I want to do in the long-term; for the purposes of this PR, is it possible to avoid introducing `aot_backend_api.c`, particularly since it isn't any different now from `crt_backend_api.c`?
   
   to follow-on to my comment and discussion: I don't disagree that we should consider a less memory-intensive lookup mechanism (and thereby changing the implementation of `TVMBackendRegisterSystemLibSymbol` for `runtime.SystemLib`. That brings up the conversation about the firmware-facing API--one I do think we should have--but there are a couple of different things at play there and I think it's something we should tackle separately from this PR.
   
   i'd propose in this PR that we should just focus on the TIR top-level function + test strategy and test coverage for that. That does spill over into the runtime a bit--e.g. invoking said top-level function is obviously part of the test strategy. However, I think we should implement something here that suffices to test the AOT as used in the C runtime, and then move on to a conversation about what to do with the `crt/runtime` directory and the firmware-facing API in general. that's the basis on which I'm suggesting to keep with just `crt_backend_api.c` for this PR.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] areusch commented on a change in pull request #7785: [AOT] Introducing AOT in TVM

Posted by GitBox <gi...@apache.org>.

areusch commented on a change in pull request #7785:
URL: https://github.com/apache/tvm/pull/7785#discussion_r608893253



##########
File path: src/target/source/source_module.cc
##########
@@ -191,17 +192,35 @@ class CSourceCrtMetadataModuleNode : public runtime::ModuleNode {
           << "}\n";
   }
 
+  void GenerateAOTDescriptor() {
+    code_ << "#include <tvm_executor.h>\n";
+    code_ << "#ifdef __cplusplus\n";
+    code_ << "extern \"C\"\n";
+    code_ << "#endif\n";
+    code_ << "TVM_DLL int32_t " << ::tvm::runtime::symbol::tvm_run_func_prefix;
+    code_ << "(void* args, void* type_code, int num_args, void* out_value, void* "
+             "out_type_code, void* resource_handle);\n";
+    code_ << "const tvm_model_t network = {\n"

Review comment:
       okay that makes sense to me




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] giuseros commented on a change in pull request #7785: [AOT] Introducing AOT in TVM

Posted by GitBox <gi...@apache.org>.

giuseros commented on a change in pull request #7785:
URL: https://github.com/apache/tvm/pull/7785#discussion_r618573550



##########
File path: src/runtime/crt/memory/stack_allocator.c
##########
@@ -0,0 +1,48 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+// LINT_C_FILE
+
+#include <tvm/runtime/crt/stack_allocator.h>
+
+void* StackMemoryManager_Allocate(tvm_workspace_t* tvm_runtime_workspace, int32_t nbytes) {
+  uint32_t offset_bytes = (~nbytes + 1) & (TVM_RUNTIME_ALLOC_ALIGNMENT_BYTES - 1);

Review comment:
       Sorry, my bad. I added them to a local copy and didn't check those in 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] giuseros commented on a change in pull request #7785: [AOT] Introducing AOT in TVM

Posted by GitBox <gi...@apache.org>.

giuseros commented on a change in pull request #7785:
URL: https://github.com/apache/tvm/pull/7785#discussion_r615043120



##########
File path: src/target/source/codegen_c_host.cc
##########
@@ -211,21 +214,34 @@ void CodeGenCHost::PrintGetFuncFromBackend(const std::string& func_name,
   this->stream << "}\n";
 }
 
-void CodeGenCHost::PrintFuncCall(const std::string& packed_func_name, int num_args) {
+void CodeGenCHost::PrintFuncCall(const std::string& packed_func_name, PrimExpr values,
+                                 int num_args) {
   this->PrintIndent();
+  std::string stack_value = "stack_value";

Review comment:
       I might try to remove this. I don't think I need it anymore




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] giuseros commented on a change in pull request #7785: [AOT] Introducing AOT in TVM

Posted by GitBox <gi...@apache.org>.

giuseros commented on a change in pull request #7785:
URL: https://github.com/apache/tvm/pull/7785#discussion_r619396142



##########
File path: include/tvm/runtime/crt/stack_allocator.h
##########
@@ -0,0 +1,58 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+// LINT_C_FILE
+#ifndef TVM_RUNTIME_CRT_STACK_ALLOCATOR_H_
+#define TVM_RUNTIME_CRT_STACK_ALLOCATOR_H_
+#include <stddef.h>
+#include <stdint.h>
+
+#include "error_codes.h"
+
+#define STACK_ALLOCATOR_TAG 0xabcd1234
+#define STACK_ALLOCATOR_TAG_SIZE_BYTES 4
+
+/*! Memory alignment for allocator */
+
+#ifndef TVM_RUNTIME_ALLOC_ALIGNMENT_BYTES
+#define TVM_RUNTIME_ALLOC_ALIGNMENT_BYTES 16

Review comment:
       I think @Mousius and @manupa-arm can comment more on this




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] areusch commented on a change in pull request #7785: [AOT] Introducing AOT in TVM

Posted by GitBox <gi...@apache.org>.

areusch commented on a change in pull request #7785:
URL: https://github.com/apache/tvm/pull/7785#discussion_r619868182



##########
File path: src/runtime/crt/memory/stack_allocator.c
##########
@@ -16,17 +16,22 @@
  * specific language governing permissions and limitations
  * under the License.
  */
-
 // LINT_C_FILE
-
 #include <tvm/runtime/crt/stack_allocator.h>
+#ifdef TVM_CRT_DEBUG
+#include <tvm/runtime/crt/logging.h>
+#endif
 
 void* StackMemoryManager_Allocate(tvm_workspace_t* tvm_runtime_workspace, int32_t nbytes) {
   uint32_t offset_bytes = (~nbytes + 1) & (TVM_RUNTIME_ALLOC_ALIGNMENT_BYTES - 1);
   uint8_t* current_alloc = tvm_runtime_workspace->next_alloc;
   uint8_t* next_alloc = tvm_runtime_workspace->next_alloc + nbytes + offset_bytes;
   uint8_t* workspace_end = tvm_runtime_workspace->workspace + tvm_runtime_workspace->workspace_size;
-
+#ifdef TVM_CRT_DEBUG

Review comment:
       ah i'm sorry--my previous comment was incorrect. you're indeed right that this checking mode wastes 4 bytes per tensor. I understand that we especially need to be cognizant of the alignment issue, so how about this proposal then:
   
   - the libmemory.a used in `make crttest` is built with the `crt_config.h` used to build `standalone_crt` for:
      1. the host machine
      2. tests launched from `test_crt.py`
   - in these cases, we can afford the extra cycles for a check, and I think this shouldn't affect test results since we aren't using accelerators from here.
   - so my proposal is: let's include this check by defining a flag in `src/runtime/crt/host/crt_config.h`. we can also include a commented copy of this flag in the template `crt_config.h`, but leave it off there. finally, i'm not sure we should use `TVM_CRT_DEBUG`, since that may be spammy in unit tests; but perhaps we could define another flag?
   
   let me know how this sounds to you. the main thing i'd like to achieve is that unit tests of the CRT and of the AOT that are executed in CI are run with this check enabled. 
   
   btw, we should probably discuss moving the alignment rules to a higher level than the runtime memory allocator. I think it's fine to leave it there for this PR, but seems like it should be part of our higher-level memory planning work.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] areusch commented on a change in pull request #7785: [AOT] Introducing AOT in TVM

Posted by GitBox <gi...@apache.org>.

areusch commented on a change in pull request #7785:
URL: https://github.com/apache/tvm/pull/7785#discussion_r608907358



##########
File path: src/relay/backend/aot_codegen.cc
##########
@@ -0,0 +1,675 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+/*!
+ * \file relay/backend/graph_codegen.cc
+ * \brief Graph runtime codegen
+ */
+
+#include <dmlc/any.h>
+#include <tvm/ir/module.h>
+#include <tvm/relay/expr_functor.h>
+#include <tvm/runtime/device_api.h>
+#include <tvm/tir/builtin.h>
+#include <tvm/tir/expr.h>
+#include <tvm/tir/stmt.h>
+
+#include <algorithm>
+#include <list>
+#include <string>
+#include <vector>
+
+#include "../../runtime/meta_data.h"
+#include "compile_engine.h"
+#include "utils.h"
+
+namespace tvm {
+namespace relay {
+namespace backend {
+
+using IntegerArray = Array<Integer>;
+using ShapeVector = std::vector<std::vector<int64_t>>;
+using GraphAttrs = std::unordered_map<std::string, dmlc::any>;
+using TargetsMap = std::unordered_map<int, Target>;
+
+/*! \brief Lowered outputs */
+struct AOTLoweredOutput {
+  std::string graph_tir;
+  Map<String, IRModule> lowered_funcs;
+  Array<tvm::runtime::Module> external_mods;
+  std::unordered_map<std::string, std::pair<int, const tvm::runtime::NDArray>> params;
+  runtime::AOTMetadata aot_metadata;
+};
+
+class AotReturnSidVisitor : public ExprVisitor {
+ public:
+  explicit AotReturnSidVisitor(Map<Expr, Array<IntegerArray>> storage_device_map)
+      : storage_device_map_{storage_device_map}, return_sid_{-1} {}
+
+  IntegerArray FindReturnSid(Function func) {
+    VisitExpr(func->body);
+    return return_sid_;
+  }
+
+ protected:
+  void AssignReturnSid(Expr e) {
+    auto iter = storage_device_map_.find(e);
+    if (iter != storage_device_map_.end()) {
+      return_sid_ = (*iter).second[0];
+    }
+  }
+
+  void VisitExpr_(const ConstantNode* cn) override {
+    ExprVisitor::VisitExpr_(cn);
+    AssignReturnSid(GetRef<Expr>(cn));
+  }
+
+  void VisitExpr_(const VarNode* vn) override {
+    ExprVisitor::VisitExpr_(vn);
+    AssignReturnSid(GetRef<Expr>(vn));
+  }
+
+  void VisitExpr_(const CallNode* cn) override {
+    ExprVisitor::VisitExpr_(cn);
+    AssignReturnSid(GetRef<Expr>(cn));
+  }
+
+  void VisitExpr_(const LetNode* op) override { VisitExpr(op->body); }
+
+  void VisitExpr_(const TupleNode* tn) override {
+    ExprVisitor::VisitExpr_(tn);
+    AssignReturnSid(GetRef<Expr>(tn));
+  }
+
+ private:
+  Map<Expr, Array<IntegerArray>> storage_device_map_;
+  IntegerArray return_sid_;
+};
+
+using TIRNetwork = tvm::Array<tir::Stmt>;
+
+/*! \brief Code generator for graph runtime */
+class AOTCodegen : public ExprVisitor {
+ protected:
+  /*!
+   * \brief Utility function to allocate a DLTensor or TVMValue
+   * \param  type the type of allocation
+   * \param num the number of variable to allocate on the stack
+   * \return PrimExpr representing the allocated object
+   */
+  PrimExpr StackAlloca(std::string type, size_t num) {
+    Array<PrimExpr> args = {tir::StringImm(type), ConstInt32(num)};
+    return tir::Call(DataType::Handle(), tir::builtin::tvm_stack_alloca(), args);
+  }
+
+  /*!
+   * \brief Utility function to allocate memory for storage identifiers
+   * \param  memory_size_byte size in bytes of the allocation
+   * \return PrimExpr representing the allocated memory
+   */
+  PrimExpr AllocateBackendMemory(int memory_size_byte) {
+    // TODO(giuseros): use tir::Allocate instead of TVMBackendAllocWorkspace
+    // to enable unified memory planning
+    static const Op& op = Op::Get("tir.TVMBackendAllocWorkspace");
+    return tvm::tir::Call(DataType::Handle(), op, {1, 0, memory_size_byte, 2, 8});
+  }
+
+  /*!
+   * \brief Utility function to convert a concrete integer to a PrimExpr.
+   * \param num the number to convert
+   * \return PrimExpr representing num
+   */
+  inline PrimExpr ConstInt32(size_t num) {
+    ICHECK_LE(num, std::numeric_limits<int>::max());
+    return tir::make_const(DataType::Int(32), static_cast<int>(num));
+  }
+
+  /*!
+   * \brief Return a vector of variables that represents the sids for the given Relay Expr
+   */
+  std::vector<tir::Var> pack_sid(Expr expr) {
+    Array<IntegerArray> sids = storage_device_map_[expr];
+    std::vector<tir::Var> sid_vars;
+
+    // Note that an expression can have multiple sids associated with it
+    // e.g., returning multiple values from a function
+    for (const auto& sid : sids[0]) {
+      // Determine if an sid is an output buffer
+      int sid_int = static_cast<int>((sid.as<IntImmNode>())->value);
+      auto output_iter = std::find(return_sid_.begin(), return_sid_.end(), sid_int);
+      if (output_iter != return_sid_.end()) {
+        int output_index = std::distance(return_sid_.begin(), output_iter);
+        sid_vars.push_back(main_signature_[input_vars_.size() + output_index]);
+        continue;
+      }
+      // Pack the sid inside the TVMValue
+      auto sid_array = te::Var(make_string("sid_", sid, "_value"), DataType::Handle());
+      auto sid_value = sids_table_[sid];
+      tvm::PrimExpr set_tensor =
+          tvm::tir::Call(DataType::Handle(), tvm::tir::builtin::tvm_struct_set(),
+                         {sid_array, 0, tir::builtin::kArrData, sid_value});
+      stmts_.push_back(tir::LetStmt(sid_array, StackAlloca("array", 1), tir::Evaluate(set_tensor)));
+      sid_vars.push_back(sid_array);
+    }
+    return sid_vars;
+  }
+
+  /*!
+   * \brief Utility function to return a parameter associated with an expression
+   * \param expr Relay Expression assicated with the parameter
+   * \return Variable that represents the DLTensor associated with the parameters
+   */
+  tir::Var pack_param(Expr expr) {
+    // TODO(giuseros): Using call_extern to call into lookup_linked_param. This is because the
+    // builtin::ret is not supported yet in the c target. Once return is supported we can use
+    // tvm_call_packed_lowered().
+    int param_sid = param_storage_ids_[reverse_params_lookup_[expr]];
+    auto lookup_linked_param_fn = tir::StringImm(::tvm::runtime::symbol::tvm_lookup_linked_param);
+    auto param_array = te::Var(make_string("param_", param_sid, "_array"), DataType::Handle());
+
+    // Compose the lookup_call using a local stack
+    Array<tir::Stmt> lookup_call;
+    auto param_var = te::Var(make_string("param_", param_sid, "_value"), DataType::Handle());
+    auto ret_var = te::Var("ret_value", DataType::Handle());
+    auto ret_code = te::Var("ret_value", DataType::Handle());
+
+    lookup_call.push_back(tir::Evaluate(
+        tvm::tir::Call(DataType::Handle(), tvm::tir::builtin::tvm_struct_set(),
+                       {param_var, 0, tir::builtin::kTVMValueContent, ConstInt32(param_sid)})));
+    lookup_call.push_back(tir::Evaluate(
+        tvm::tir::Call(DataType::Handle(), tir::builtin::call_extern(),
+                       {lookup_linked_param_fn, param_var, 0, 0, ret_var, ret_code, 0})));
+    auto ret_var_handle = tvm::tir::Call(DataType::Handle(), tvm::tir::builtin::tvm_struct_get(),
+                                         {ret_var, 0, tir::builtin::kTVMValueContent});
+
+    // Set the param to the value returned by lookup_call
+    tvm::PrimExpr set_param_array =
+        tvm::tir::Call(DataType::Handle(), tvm::tir::builtin::tvm_struct_set(),
+                       {param_array, 0, tir::builtin::kArrData, ret_var_handle});
+    lookup_call.push_back(tir::Evaluate(set_param_array));
+
+    tir::Stmt lookup_body = tir::SeqStmt(lookup_call);
+
+    // Allocate the DLTensors on the stack
+    lookup_body = tir::LetStmt(param_var, StackAlloca("arg_value", 1), lookup_body);
+    lookup_body = tir::LetStmt(ret_var, StackAlloca("arg_value", 1), lookup_body);
+    lookup_body = tir::LetStmt(ret_code, StackAlloca("arg_value", 1), lookup_body);
+    lookup_body = tir::LetStmt(param_array, StackAlloca("arg_value", 1), lookup_body);
+    stmts_.push_back(lookup_body);
+    return param_array;
+  }
+
+  /*!
+   * brief Given an expression return the variable(s) associated with that expression
+   */
+  std::vector<te::Var> find_expr(Expr arg) {
+    auto input_iter = std::find(input_vars_.begin(), input_vars_.end(), arg);
+    if (input_iter != input_vars_.end()) {
+      // Input variable
+      int main_index = std::distance(input_vars_.begin(), input_iter);
+      return {main_signature_[main_index]};
+    } else if (reverse_params_lookup_.find(arg) != reverse_params_lookup_.end()) {
+      // Parameter of the network
+      return {pack_param(arg)};
+    } else {
+      // Storage identifier (i.e., intermediate memory)
+      return pack_sid(arg);
+    }
+  }
+
+  /*!
+   * brief Call a function with a given name
+   */
+  void func_call(Call call, std::string func_name) {
+    tvm::Array<PrimExpr> args{tvm::tir::StringImm(func_name)};
+    std::vector<tir::Stmt> func_call_stmts;
+
+    // Pack the inputs
+    for (Expr arg : call->args) {
+      auto var_arg = find_expr(arg);
+      args.push_back(var_arg[0]);
+    }
+
+    auto ret_expr = Downcast<Expr>(call);
+
+    // Pack the return(s) value. A call node can produce multiple outputs
+    for (const auto& var : pack_sid(ret_expr)) {
+      args.push_back(var);
+    }
+
+    // Use tvm_call_packed to execute the function
+    func_call_stmts.push_back(tir::Evaluate(
+        tvm::tir::Call(DataType::Int(32), tvm::tir::builtin::tvm_call_packed(), args)));
+    tir::Stmt body = tir::SeqStmt(func_call_stmts);
+    stmts_.push_back(body);
+  }
+
+  /*!
+   * brief Copy a variable to the output. This function is mainly used in edge cases
+   * when we want to return an input or a parameter.
+   */
+  void copy_to_output(te::Var out, te::Var in, size_t size) {
+    auto retval_get = tvm::tir::Call(DataType::Handle(), tvm::tir::builtin::tvm_struct_get(),
+                                     {in, 0, tir::builtin::kArrData});
+
+    // Define intermediate DLTensor to load/store the data
+    auto tmp0 = te::Var("tmp0", DataType::Handle());
+    auto tmp1 = te::Var("tmp1", DataType::Handle());
+    te::Var loop_idx("i", DataType::Int(32));
+    auto retval_i = tir::Load(DataType::UInt(8), tmp0, loop_idx, tir::const_true());
+    auto tostore = tvm::tir::Call(DataType::Handle(), tvm::tir::builtin::tvm_struct_get(),
+                                  {out, 0, tir::builtin::kArrData});
+
+    // Copy the variable from the input to the output
+    tir::Stmt copy = tir::For(
+        loop_idx, 0, ConstInt32(size), tir::ForKind::kSerial,
+        tir::Store(tmp1, tir::Let(tmp0, retval_get, retval_i), loop_idx, tir::const_true()));
+    stmts_.push_back(tir::LetStmt(tmp1, tostore, copy));
+  }
+
+  /*!
+   * Utility function to string together different arguments
+   */
+  template <typename... Args>
+  std::string make_string(Args const&... args) {
+    std::ostringstream ss;
+    using List = int[];
+    (void)List{0, ((void)(ss << args), 0)...};
+
+    return ss.str();
+  }
+
+  void VisitExpr_(const CallNode* op) override {
+    // Descend the call tree
+    for (auto arg : op->args) {
+      VisitExpr(arg);
+    }
+
+    Expr expr = GetRef<Expr>(op);
+    Function func;
+    if (op->op.as<OpNode>()) {
+      LOG(FATAL) << "Operators should be transformed away; try applying"
+                 << "the fuse_ops transformation to the expression.";
+    } else if (op->op.as<GlobalVarNode>()) {
+      LOG(FATAL) << "Not implemented";
+    } else if (op->op.as<FunctionNode>()) {
+      func = GetRef<Function>(op->op.as<FunctionNode>());
+    } else {
+      LOG(FATAL) << "TVM runtime does not support calls to " << op->op->GetTypeKey();
+    }
+    if (!func->HasNonzeroAttr(attr::kPrimitive)) {
+      LOG(FATAL) << "TVM only support calls to primitive functions "
+                 << "(i.e functions composed of fusable operator invocations)";
+    }
+
+    auto pf0 = GetPackedFunc("relay.backend._make_CCacheKey");
+    auto pf1 = GetPackedFunc("relay.backend._CompileEngineLower");
+    Target target;
+    // Handle external function
+    if (func->GetAttr<String>(attr::kCompiler).defined()) {
+      target = Target("ext_dev");
+      CCacheKey key = (*pf0)(func, target);
+      CachedFunc ext_func = (*pf1)(compile_engine_, key);
+      ICHECK(ext_func.defined()) << "External function is not defined.";
+      UpdateConstants(func, &params_);
+
+      // Generate the TIR function call
+      func_call(GetRef<Call>(op), ext_func->func_name);
+    }
+
+    ICHECK_GE(storage_device_map_.count(expr), 0);
+    auto& device_type = storage_device_map_[expr][1];
+    auto call_dev_type = device_type[0]->value;
+    // Normal Relay Function
+    if (targets_.size() == 1) {
+      // homogeneous execution.
+      const auto& it = targets_.begin();
+      target = (*it).second;
+    } else {
+      // heterogeneous execution.
+      std::string call_dev_name;
+      if (call_dev_type == 0) {
+        call_dev_name = "llvm";
+      } else {
+        call_dev_name = runtime::DeviceName(call_dev_type);
+      }
+      if (targets_.count(call_dev_type) == 0) {
+        LOG(FATAL) << "No target is provided for device " << call_dev_name;
+      }
+      target = targets_[call_dev_type];
+    }
+    CCacheKey key = (*pf0)(func, target);
+    CachedFunc lowered_func = (*pf1)(compile_engine_, key);
+    if (!lowered_funcs_.count(target->str())) {
+      lowered_funcs_[target->str()] = IRModule(Map<GlobalVar, BaseFunc>({}));
+    }
+    lowered_funcs_[target->str()]->Update(lowered_func->funcs);
+
+    // Generate the TIR function call
+    func_call(GetRef<Call>(op), lowered_func->func_name);
+  }
+
+  void VisitExpr_(const VarNode* op) override {
+    Expr expr = GetRef<Expr>(op);
+
+    // If the Var node is an output node we need to copy the content of the variable to the output
+    // A Var node can only produce a single output

Review comment:
       that's true, but here you presume that exactly one DLTensor ever occupies that storage_id, no? That's only safe because you are using this function w/ params and inputs, which are the only such storage_id for which that is guaranteed true.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] jwfromm commented on pull request #7785: [AOT] Introducing AOT in TVM

Posted by GitBox <gi...@apache.org>.

jwfromm commented on pull request #7785:
URL: https://github.com/apache/tvm/pull/7785#issuecomment-812055852


   Thanks for this awesome contribution @giuseros, its definitely a big milestone for the project. I looked through your changes and didn't find any docs or tutorials on how to use the AOT. Although there are some tests that show it being used, I think a doc would go a long way for encouraging others to use this approach. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] giuseros commented on pull request #7785: [AOT] Introducing AOT in TVM

Posted by GitBox <gi...@apache.org>.

giuseros commented on pull request #7785:
URL: https://github.com/apache/tvm/pull/7785#issuecomment-814383996


   Hi @jwfromm ,
   I am happy to add docs for AOT. Could you just point me to an example of what you mean by doc/tutorial? Where should it go in the codebase?
   
   Thanks!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] tqchen commented on a change in pull request #7785: [AOT] Introducing AOT in TVM

Posted by GitBox <gi...@apache.org>.

tqchen commented on a change in pull request #7785:
URL: https://github.com/apache/tvm/pull/7785#discussion_r611876257



##########
File path: src/target/source/codegen_c_host.cc
##########
@@ -40,13 +40,16 @@ namespace codegen {
 
 CodeGenCHost::CodeGenCHost() { module_name_ = GetUniqueName("__tvm_module_ctx"); }
 
-void CodeGenCHost::Init(bool output_ssa, bool emit_asserts, std::string target_str) {
+void CodeGenCHost::Init(bool output_ssa, bool emit_asserts, bool is_aot_executor,
+                        std::string target_str) {

Review comment:
       Expanding further, we have made a set of assumptions of the availability of tvm's runtime API. On the default land that contract is runtime/c_runtime_api and runtime/c_backend_api, both of which are clearly documented.
   
   Given the potential need to have a different path on the embedded setting(to avoid string lookup and use symbol), we should clarify and document what is the CRT standard so that the new code generator can refer to




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] areusch commented on a change in pull request #7785: [AOT] Introducing AOT in TVM

Posted by GitBox <gi...@apache.org>.

areusch commented on a change in pull request #7785:
URL: https://github.com/apache/tvm/pull/7785#discussion_r607217733



##########
File path: tests/python/relay/aot/test_crt_aot.py
##########
@@ -0,0 +1,258 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+import tflite
+import os
+import io
+import struct
+import numpy as np
+import pathlib
+import shutil
+import subprocess
+import tempfile
+import tarfile
+
+
+import tvm
+from tvm import relay
+from tvm.relay import transform
+from tvm.relay.op.contrib import get_pattern_table
+from tvm.contrib import utils
+from tvm.relay.backend import compile_engine
+from tvm.contrib import utils
+from tvm.contrib import graph_runtime
+from tvm.micro import export_model_library_format
+from tvm.relay import testing
+
+from infra import *
+
+
+def test_conv_with_params():
+    RELAY_MODEL = """
+#[version = "0.0.5"]
+def @main(%data : Tensor[(1, 3, 64, 64), uint8], %weight : Tensor[(8, 3, 5, 5), int8]) {
+    %1 = nn.conv2d(
+         %data,
+         %weight,
+         padding=[2, 2],
+         channels=8,
+         kernel_size=[5, 5],
+         data_layout="NCHW",
+         kernel_layout="OIHW",
+         out_dtype="int32");
+  %1
+}
+"""
+    mod = tvm.parser.fromtext(RELAY_MODEL)
+    main_func = mod["main"]
+    shape_dict = {p.name_hint: p.checked_type.concrete_shape for p in main_func.params}
+    type_dict = {p.name_hint: p.checked_type.dtype for p in main_func.params}
+
+    weight_data = np.ones(shape_dict["weight"]).astype(type_dict["weight"])
+    input_data = np.ones(shape_dict["data"]).astype(type_dict["data"])
+
+    params = {"weight": weight_data}
+    inputs = {"data": input_data}
+    output_list = generate_ref_data(mod, inputs, params)
+
+    input_list = [input_data]
+    verify_source(mod, input_list, output_list, params)
+
+
+def test_add_with_params():
+    x = relay.var("x", shape=(1, 10))
+    y = relay.var("y", shape=(1, 10))
+    z = relay.add(x, y)
+    func = relay.Function([x, y], z)
+
+    x_in = np.ones((1, 10)).astype("float32")
+    y_in = np.random.uniform(size=(1, 10)).astype("float32")
+
+    params = {"x": x_in}
+    inputs = {"y": y_in}
+    output_list = generate_ref_data(func, inputs, params)
+
+    input_list = [y_in]
+    verify_source(func, input_list, output_list, params)
+
+
+def test_conv2d():
+    """Test a subgraph with a single conv2d operator."""
+
+    def conv2d_direct():
+        dtype = "float32"
+        ishape = (1, 32, 14, 14)
+        w1shape = (32, 32, 3, 3)
+
+        data0 = relay.var("data", shape=ishape, dtype=dtype)
+        weight0 = relay.var("weight", shape=w1shape, dtype=dtype)
+        out = relay.nn.conv2d(data0, weight0, kernel_size=(3, 3), padding=(1, 1))
+        main_f = relay.Function([data0, weight0], out)
+        mod = tvm.IRModule()
+        mod["main"] = main_f
+        mod = transform.InferType()(mod)
+
+        i_data = np.random.uniform(0, 1, ishape).astype(dtype)
+        w1_data = np.random.uniform(0, 1, w1shape).astype(dtype)
+
+        return mod, {"data": i_data, "weight": w1_data}, (1, 32, 14, 14)
+
+    def group_conv2d():
+        dtype = "float32"
+        ishape = (1, 32, 14, 14)
+        w2shape = (32, 1, 3, 3)
+
+        data0 = relay.var("data", shape=(ishape), dtype=dtype)
+        weight0 = relay.var("weight", shape=(w2shape), dtype=dtype)
+        out = relay.nn.conv2d(data0, weight0, kernel_size=(3, 3), padding=(1, 1), groups=32)
+        main_f = relay.Function([data0, weight0], out)
+        mod = tvm.IRModule()
+        mod["main"] = main_f
+        mod = transform.InferType()(mod)
+
+        i_data = np.random.uniform(0, 1, ishape).astype(dtype)
+        w_data = np.random.uniform(0, 1, w2shape).astype(dtype)
+
+        return mod, {"data": i_data, "weight": w_data}, (1, 32, 14, 14)
+
+    for mod, inputs, out_shape in [conv2d_direct(), group_conv2d()]:
+        output_list = generate_ref_data(mod, inputs)
+        input_list = [inputs["data"], inputs["weight"]]
+        verify_source(mod, input_list, output_list)
+
+
+def test_concatenate():
+    dtype = "float32"
+    x = relay.var("x", shape=(10, 5), dtype=dtype)
+    y = relay.var("y", shape=(10, 5), dtype=dtype)
+    t = relay.var("z", shape=(), dtype=dtype)
+    z = relay.concatenate((x, y), axis=1)
+    z = relay.add(z, t)
+    # Check result.
+    func = relay.Function([x, y, t], z)
+    x_data = np.random.rand(10, 5).astype(dtype)
+    y_data = np.random.rand(10, 5).astype(dtype)
+    t_data = np.random.uniform(size=()).astype(dtype)
+    inputs = {"x": x_data, "y": y_data, "z": t_data}
+
+    output_list = generate_ref_data(func, inputs)
+    input_list = [inputs["x"], inputs["y"], inputs["z"]]
+    verify_source(func, input_list, output_list)
+
+
+def test_nested_tuples():
+    x = relay.var("x", shape=(10,))
+    x1 = x + relay.const(1.0)
+    x2 = x1 + relay.const(1.0)
+    x3 = x2 + relay.const(1.0)
+    x4 = x3 + relay.const(1.0)
+    out = relay.Tuple([x1, relay.Tuple([relay.Tuple([x2, x3]), x4])])
+    func = relay.Function([x], out)
+
+    x_data = np.random.uniform(size=(10,)).astype(np.float32)
+    inputs = {"x": x_data}
+    output_list = generate_ref_data(func, inputs)
+    input_list = [x_data]
+    verify_source(func, input_list, output_list)
+
+
+def test_tuple_getitem():
+    func = relay.Function([], relay.TupleGetItem(relay.Tuple([relay.const(1), relay.const(2)]), 0))
+    output_list = generate_ref_data(func, {})
+    input_list = []
+    verify_source(func, input_list, output_list)
+
+
+def test_id():
+    x = relay.var("x", "float32")
+    ident = relay.Function([x], x)
+    one = np.array(1.0, "float32")
+    inputs = {"x": one}
+    output_list = generate_ref_data(ident, inputs)
+    input_list = [one]
+    verify_source(ident, input_list, output_list)
+
+
+def test_add_const():
+    two = relay.add(relay.const(1), relay.const(1))
+    func = relay.Function([], two)
+    output_list = generate_ref_data(func, {})
+    input_list = []
+    verify_source(func, input_list, output_list)
+
+
+def test_mul_param():
+    x = relay.var("x", shape=(10, 10))
+    y = relay.var("y", shape=(1, 10))
+    func = relay.Function([x, y], relay.multiply(x, y))
+    x_data = np.random.rand(10, 10).astype("float32")
+    y_data = np.random.rand(1, 10).astype("float32")
+    inputs = {"x": x_data, "y": y_data}
+    output_list = generate_ref_data(func, inputs)
+    input_list = [inputs["x"], inputs["y"]]
+    verify_source(func, input_list, output_list)
+
+
+def test_subtract():
+    i = relay.var("i", shape=[], dtype="int32")
+    sub = relay.subtract(i, relay.const(1, dtype="int32"))
+    func = relay.Function([i], sub, ret_type=relay.TensorType([], "int32"))
+    i_data = np.array(1, dtype="int32")
+    inputs = {"i": i_data}
+    output_list = generate_ref_data(func, inputs)
+    input_list = [inputs["i"]]
+    verify_source(func, input_list, output_list)
+
+
+def test_tuple_output():
+    x = relay.var("x", shape=(6, 9))
+    y = relay.split(x, 3).astuple()
+    a = relay.TupleGetItem(y, 0)
+    b = relay.TupleGetItem(y, 1)
+    c = relay.TupleGetItem(y, 2)
+    out = relay.Tuple([a, b])
+    func = relay.Function([x], out)
+    x_data = np.random.rand(6, 9).astype("float32")
+    inputs = {"x": x_data}
+    output_list = generate_ref_data(func, inputs)
+    input_list = [inputs["x"]]
+    verify_source(func, input_list, output_list)
+
+
+def test_mobilenet():
+    mod, params = testing.mobilenet.get_workload(batch_size=1)
+    data_shape = [int(x) for x in mod["main"].checked_type.arg_types[0].shape]
+    data = np.random.uniform(size=data_shape).astype("float32")
+    inputs = {"data": data}
+    output_list = generate_ref_data(mod, inputs, params)
+    input_list = [inputs["data"]]
+    verify_source(mod, input_list, output_list, params)
+
+
+if __name__ == "__main__":

Review comment:
       ```if __name__ == "__main__":
       sys.exit(pytest.main([__file__] + sys.argv[1:]))
   ```

##########
File path: include/tvm/runtime/crt/aot/tvm_error.h
##########
@@ -0,0 +1,68 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+/*!
+ * \file include/tvm/runtime/crt/aot/tvm_error.h
+ * \brief Defines a subset of error codes returned by the CRT AOT executor.
+ */
+
+#ifndef TVM_RUNTIME_CRT_AOT_TVM_ERROR_H_
+#define TVM_RUNTIME_CRT_AOT_TVM_ERROR_H_
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+#define TVM_CRT_ERROR_CATEGORY_Pos 8
+#define TVM_CRT_ERROR_CATEGORY_Msk (0xff << TVM_CRT_ERROR_CATEGORY_Pos)
+#define TVM_CRT_ERROR_CODE_Pos 0
+#define TVM_CRT_ERROR_CODE_Msk (0xff << TVM_CRT_ERROR_CODE_Pos)
+
+#define DEFINE_TVM_CRT_ERROR(category, code) \
+  (((category) << TVM_CRT_ERROR_CATEGORY_Pos) | ((code) << TVM_CRT_ERROR_CODE_Pos))
+typedef enum {
+  kTvmErrorCategoryPlatform = 5,
+  kTvmErrorCategoryFunctionCall = 8,
+} tvm_crt_error_category_t;
+
+typedef enum {

Review comment:
       same here--i'd like to reuse include/tvm/runtime/crt/error_codes.h for both AOT and GraphExecutors. can we merge them?

##########
File path: include/tvm/runtime/crt/aot/tvm_backend.h
##########
@@ -0,0 +1,104 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+/*!
+ * \file include/tvm/runtime/crt/aot/tvm_backend.h
+ * \brief Backend functions for the AOT executor
+ *
+ * These are not designed to user-facing and may change without warning
+ */
+
+#ifndef TVM_RUNTIME_CRT_AOT_TVM_BACKEND_H_

Review comment:
       i'd really like to reuse the backend code from `src/runtime/crt/common/crt_backend_api.c`. It's completely fine with me if we need to refactor the `common` lib to move code you see as cruft for inference deployment into a separate library (e.g. `backend` and `runtime` libraries). Any reason we need to keep a separate AOT version?

##########
File path: include/tvm/runtime/crt/aot/tvm_executor.h
##########
@@ -0,0 +1,97 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+/*!
+ * \file include/tvm/runtime/crt/aot/tvm_executor.h
+ * \brief TVM Executor for the Ahead-of-Time Runtime
+ *
+ * AOT models are described by the TVM model descriptor format
+ * which can be passed to tvm_runtime_run. These descriptors will be
+ * generated by the AOT compilation process. This can optionally be
+ * augmented with platform specific context to be passed to the TVM
+ * operators.
+ *
+ * Example:

Review comment:
       yeah so I like the idea of building a new, deployment-focused API (we have an RPC-server focused API in `src/runtime/crt/utvm_rpc_server`--why should we not have a deployment-focused one?). but, can we make it work for both GraphRuntime and AOT? we can place in e.g. `include/tvm/runtime/crt/standalone_api.h` or `include/tvm/runtime/crt/inference.h`.

##########
File path: python/tvm/micro/model_library_format.py
##########
@@ -86,10 +86,13 @@ def _build_memory_map(graph_json):
     list :
         A list with one entry per storage id describing that memory.
     """
-    graph = json.loads(graph_json)
+    memory_map = []
+    if graph_str.startswith("primfn"):

Review comment:
       can you say more about this? perhaps we should rework this function to consume metadata directly rather than the graph JSON--I don't want to overload it to contain multiple formats. It should just contain the GraphExecutor graph_json parameter.

##########
File path: include/tvm/runtime/module.h
##########
@@ -230,6 +230,8 @@ constexpr const char* tvm_module_main = "__tvm_main__";
 constexpr const char* tvm_param_prefix = "__tvm_param__";
 /*! \brief A PackedFunc that looks up linked parameters by storage_id. */
 constexpr const char* tvm_lookup_linked_param = "_lookup_linked_param";
+/*! \brief The main AOT executor function */
+constexpr const char* tvm_run_func_prefix = "tvm__run_func";

Review comment:
       (before merging) let's break this into a separate change

##########
File path: python/tvm/micro/model_library_format.py
##########
@@ -132,14 +135,25 @@ def export_model_library_format(mod: graph_executor_factory.GraphExecutorFactory
         Path to the .tar archive to generate.
     """
     tempdir = utils.tempdir()
+    is_aot = False
+    for v in mod.target.values():
+        if v.attrs.get("executor", "graph_runtime") == "aot":
+            is_aot = True
+            break
+
+    runtime = ["graph"]

Review comment:
       place this in the else: block

##########
File path: src/relay/backend/build_module.cc
##########
@@ -524,7 +556,8 @@ class RelayBuildModule : public runtime::ModuleNode {
     }
 
     auto ext_mods = graph_codegen_->GetExternalModules();
-    ret_.mod = tvm::codegen::CreateMetadataModule(ret_.params, ret_.mod, ext_mods, GetTargetHost());
+    ret_.mod = tvm::codegen::CreateMetadataModule(ret_.params, ret_.mod, ext_mods, GetTargetHost(),
+                                                  graph_codegen_->GetAOTMetdata());

Review comment:
       typo?

##########
File path: src/relay/backend/aot_codegen.cc
##########
@@ -0,0 +1,675 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+/*!
+ * \file relay/backend/graph_codegen.cc
+ * \brief Graph runtime codegen
+ */
+
+#include <dmlc/any.h>
+#include <tvm/ir/module.h>
+#include <tvm/relay/expr_functor.h>
+#include <tvm/runtime/device_api.h>
+#include <tvm/tir/builtin.h>
+#include <tvm/tir/expr.h>
+#include <tvm/tir/stmt.h>
+
+#include <algorithm>
+#include <list>
+#include <string>
+#include <vector>
+
+#include "../../runtime/meta_data.h"
+#include "compile_engine.h"
+#include "utils.h"
+
+namespace tvm {
+namespace relay {
+namespace backend {
+
+using IntegerArray = Array<Integer>;
+using ShapeVector = std::vector<std::vector<int64_t>>;
+using GraphAttrs = std::unordered_map<std::string, dmlc::any>;
+using TargetsMap = std::unordered_map<int, Target>;
+
+/*! \brief Lowered outputs */
+struct AOTLoweredOutput {
+  std::string graph_tir;
+  Map<String, IRModule> lowered_funcs;
+  Array<tvm::runtime::Module> external_mods;
+  std::unordered_map<std::string, std::pair<int, const tvm::runtime::NDArray>> params;
+  runtime::AOTMetadata aot_metadata;
+};
+
+class AotReturnSidVisitor : public ExprVisitor {
+ public:
+  explicit AotReturnSidVisitor(Map<Expr, Array<IntegerArray>> storage_device_map)
+      : storage_device_map_{storage_device_map}, return_sid_{-1} {}
+
+  IntegerArray FindReturnSid(Function func) {
+    VisitExpr(func->body);
+    return return_sid_;
+  }
+
+ protected:
+  void AssignReturnSid(Expr e) {
+    auto iter = storage_device_map_.find(e);
+    if (iter != storage_device_map_.end()) {
+      return_sid_ = (*iter).second[0];
+    }
+  }
+
+  void VisitExpr_(const ConstantNode* cn) override {
+    ExprVisitor::VisitExpr_(cn);
+    AssignReturnSid(GetRef<Expr>(cn));
+  }
+
+  void VisitExpr_(const VarNode* vn) override {
+    ExprVisitor::VisitExpr_(vn);
+    AssignReturnSid(GetRef<Expr>(vn));
+  }
+
+  void VisitExpr_(const CallNode* cn) override {
+    ExprVisitor::VisitExpr_(cn);
+    AssignReturnSid(GetRef<Expr>(cn));
+  }
+
+  void VisitExpr_(const LetNode* op) override { VisitExpr(op->body); }
+
+  void VisitExpr_(const TupleNode* tn) override {
+    ExprVisitor::VisitExpr_(tn);
+    AssignReturnSid(GetRef<Expr>(tn));
+  }
+
+ private:
+  Map<Expr, Array<IntegerArray>> storage_device_map_;
+  IntegerArray return_sid_;
+};
+
+using TIRNetwork = tvm::Array<tir::Stmt>;
+
+/*! \brief Code generator for graph runtime */
+class AOTCodegen : public ExprVisitor {
+ protected:
+  /*!
+   * \brief Utility function to allocate a DLTensor or TVMValue
+   * \param  type the type of allocation
+   * \param num the number of variable to allocate on the stack
+   * \return PrimExpr representing the allocated object
+   */
+  PrimExpr StackAlloca(std::string type, size_t num) {
+    Array<PrimExpr> args = {tir::StringImm(type), ConstInt32(num)};
+    return tir::Call(DataType::Handle(), tir::builtin::tvm_stack_alloca(), args);
+  }
+
+  /*!
+   * \brief Utility function to allocate memory for storage identifiers
+   * \param  memory_size_byte size in bytes of the allocation
+   * \return PrimExpr representing the allocated memory
+   */
+  PrimExpr AllocateBackendMemory(int memory_size_byte) {
+    // TODO(giuseros): use tir::Allocate instead of TVMBackendAllocWorkspace
+    // to enable unified memory planning
+    static const Op& op = Op::Get("tir.TVMBackendAllocWorkspace");
+    return tvm::tir::Call(DataType::Handle(), op, {1, 0, memory_size_byte, 2, 8});
+  }
+
+  /*!
+   * \brief Utility function to convert a concrete integer to a PrimExpr.
+   * \param num the number to convert
+   * \return PrimExpr representing num
+   */
+  inline PrimExpr ConstInt32(size_t num) {
+    ICHECK_LE(num, std::numeric_limits<int>::max());
+    return tir::make_const(DataType::Int(32), static_cast<int>(num));
+  }
+
+  /*!
+   * \brief Return a vector of variables that represents the sids for the given Relay Expr
+   */
+  std::vector<tir::Var> pack_sid(Expr expr) {
+    Array<IntegerArray> sids = storage_device_map_[expr];
+    std::vector<tir::Var> sid_vars;
+
+    // Note that an expression can have multiple sids associated with it
+    // e.g., returning multiple values from a function
+    for (const auto& sid : sids[0]) {
+      // Determine if an sid is an output buffer
+      int sid_int = static_cast<int>((sid.as<IntImmNode>())->value);
+      auto output_iter = std::find(return_sid_.begin(), return_sid_.end(), sid_int);
+      if (output_iter != return_sid_.end()) {
+        int output_index = std::distance(return_sid_.begin(), output_iter);
+        sid_vars.push_back(main_signature_[input_vars_.size() + output_index]);
+        continue;
+      }
+      // Pack the sid inside the TVMValue
+      auto sid_array = te::Var(make_string("sid_", sid, "_value"), DataType::Handle());
+      auto sid_value = sids_table_[sid];
+      tvm::PrimExpr set_tensor =
+          tvm::tir::Call(DataType::Handle(), tvm::tir::builtin::tvm_struct_set(),
+                         {sid_array, 0, tir::builtin::kArrData, sid_value});
+      stmts_.push_back(tir::LetStmt(sid_array, StackAlloca("array", 1), tir::Evaluate(set_tensor)));
+      sid_vars.push_back(sid_array);
+    }
+    return sid_vars;
+  }
+
+  /*!
+   * \brief Utility function to return a parameter associated with an expression
+   * \param expr Relay Expression assicated with the parameter
+   * \return Variable that represents the DLTensor associated with the parameters
+   */
+  tir::Var pack_param(Expr expr) {
+    // TODO(giuseros): Using call_extern to call into lookup_linked_param. This is because the
+    // builtin::ret is not supported yet in the c target. Once return is supported we can use
+    // tvm_call_packed_lowered().
+    int param_sid = param_storage_ids_[reverse_params_lookup_[expr]];
+    auto lookup_linked_param_fn = tir::StringImm(::tvm::runtime::symbol::tvm_lookup_linked_param);
+    auto param_array = te::Var(make_string("param_", param_sid, "_array"), DataType::Handle());
+
+    // Compose the lookup_call using a local stack
+    Array<tir::Stmt> lookup_call;
+    auto param_var = te::Var(make_string("param_", param_sid, "_value"), DataType::Handle());
+    auto ret_var = te::Var("ret_value", DataType::Handle());
+    auto ret_code = te::Var("ret_value", DataType::Handle());
+
+    lookup_call.push_back(tir::Evaluate(
+        tvm::tir::Call(DataType::Handle(), tvm::tir::builtin::tvm_struct_set(),
+                       {param_var, 0, tir::builtin::kTVMValueContent, ConstInt32(param_sid)})));
+    lookup_call.push_back(tir::Evaluate(
+        tvm::tir::Call(DataType::Handle(), tir::builtin::call_extern(),
+                       {lookup_linked_param_fn, param_var, 0, 0, ret_var, ret_code, 0})));
+    auto ret_var_handle = tvm::tir::Call(DataType::Handle(), tvm::tir::builtin::tvm_struct_get(),
+                                         {ret_var, 0, tir::builtin::kTVMValueContent});
+
+    // Set the param to the value returned by lookup_call
+    tvm::PrimExpr set_param_array =
+        tvm::tir::Call(DataType::Handle(), tvm::tir::builtin::tvm_struct_set(),
+                       {param_array, 0, tir::builtin::kArrData, ret_var_handle});
+    lookup_call.push_back(tir::Evaluate(set_param_array));
+
+    tir::Stmt lookup_body = tir::SeqStmt(lookup_call);
+
+    // Allocate the DLTensors on the stack
+    lookup_body = tir::LetStmt(param_var, StackAlloca("arg_value", 1), lookup_body);
+    lookup_body = tir::LetStmt(ret_var, StackAlloca("arg_value", 1), lookup_body);
+    lookup_body = tir::LetStmt(ret_code, StackAlloca("arg_value", 1), lookup_body);
+    lookup_body = tir::LetStmt(param_array, StackAlloca("arg_value", 1), lookup_body);
+    stmts_.push_back(lookup_body);
+    return param_array;
+  }
+
+  /*!
+   * brief Given an expression return the variable(s) associated with that expression
+   */
+  std::vector<te::Var> find_expr(Expr arg) {
+    auto input_iter = std::find(input_vars_.begin(), input_vars_.end(), arg);
+    if (input_iter != input_vars_.end()) {
+      // Input variable
+      int main_index = std::distance(input_vars_.begin(), input_iter);
+      return {main_signature_[main_index]};
+    } else if (reverse_params_lookup_.find(arg) != reverse_params_lookup_.end()) {
+      // Parameter of the network
+      return {pack_param(arg)};
+    } else {
+      // Storage identifier (i.e., intermediate memory)
+      return pack_sid(arg);
+    }
+  }
+
+  /*!
+   * brief Call a function with a given name
+   */
+  void func_call(Call call, std::string func_name) {
+    tvm::Array<PrimExpr> args{tvm::tir::StringImm(func_name)};
+    std::vector<tir::Stmt> func_call_stmts;
+
+    // Pack the inputs
+    for (Expr arg : call->args) {
+      auto var_arg = find_expr(arg);
+      args.push_back(var_arg[0]);
+    }
+
+    auto ret_expr = Downcast<Expr>(call);
+
+    // Pack the return(s) value. A call node can produce multiple outputs
+    for (const auto& var : pack_sid(ret_expr)) {
+      args.push_back(var);
+    }
+
+    // Use tvm_call_packed to execute the function
+    func_call_stmts.push_back(tir::Evaluate(
+        tvm::tir::Call(DataType::Int(32), tvm::tir::builtin::tvm_call_packed(), args)));
+    tir::Stmt body = tir::SeqStmt(func_call_stmts);
+    stmts_.push_back(body);
+  }
+
+  /*!
+   * brief Copy a variable to the output. This function is mainly used in edge cases
+   * when we want to return an input or a parameter.
+   */
+  void copy_to_output(te::Var out, te::Var in, size_t size) {
+    auto retval_get = tvm::tir::Call(DataType::Handle(), tvm::tir::builtin::tvm_struct_get(),
+                                     {in, 0, tir::builtin::kArrData});
+
+    // Define intermediate DLTensor to load/store the data
+    auto tmp0 = te::Var("tmp0", DataType::Handle());
+    auto tmp1 = te::Var("tmp1", DataType::Handle());
+    te::Var loop_idx("i", DataType::Int(32));
+    auto retval_i = tir::Load(DataType::UInt(8), tmp0, loop_idx, tir::const_true());
+    auto tostore = tvm::tir::Call(DataType::Handle(), tvm::tir::builtin::tvm_struct_get(),
+                                  {out, 0, tir::builtin::kArrData});
+
+    // Copy the variable from the input to the output
+    tir::Stmt copy = tir::For(
+        loop_idx, 0, ConstInt32(size), tir::ForKind::kSerial,
+        tir::Store(tmp1, tir::Let(tmp0, retval_get, retval_i), loop_idx, tir::const_true()));
+    stmts_.push_back(tir::LetStmt(tmp1, tostore, copy));
+  }
+
+  /*!
+   * Utility function to string together different arguments
+   */
+  template <typename... Args>
+  std::string make_string(Args const&... args) {
+    std::ostringstream ss;
+    using List = int[];
+    (void)List{0, ((void)(ss << args), 0)...};
+
+    return ss.str();
+  }
+
+  void VisitExpr_(const CallNode* op) override {
+    // Descend the call tree
+    for (auto arg : op->args) {
+      VisitExpr(arg);
+    }
+
+    Expr expr = GetRef<Expr>(op);
+    Function func;
+    if (op->op.as<OpNode>()) {
+      LOG(FATAL) << "Operators should be transformed away; try applying"
+                 << "the fuse_ops transformation to the expression.";
+    } else if (op->op.as<GlobalVarNode>()) {
+      LOG(FATAL) << "Not implemented";
+    } else if (op->op.as<FunctionNode>()) {
+      func = GetRef<Function>(op->op.as<FunctionNode>());
+    } else {
+      LOG(FATAL) << "TVM runtime does not support calls to " << op->op->GetTypeKey();
+    }
+    if (!func->HasNonzeroAttr(attr::kPrimitive)) {
+      LOG(FATAL) << "TVM only support calls to primitive functions "
+                 << "(i.e functions composed of fusable operator invocations)";
+    }
+
+    auto pf0 = GetPackedFunc("relay.backend._make_CCacheKey");
+    auto pf1 = GetPackedFunc("relay.backend._CompileEngineLower");
+    Target target;
+    // Handle external function
+    if (func->GetAttr<String>(attr::kCompiler).defined()) {
+      target = Target("ext_dev");
+      CCacheKey key = (*pf0)(func, target);
+      CachedFunc ext_func = (*pf1)(compile_engine_, key);
+      ICHECK(ext_func.defined()) << "External function is not defined.";
+      UpdateConstants(func, &params_);
+
+      // Generate the TIR function call
+      func_call(GetRef<Call>(op), ext_func->func_name);
+    }
+
+    ICHECK_GE(storage_device_map_.count(expr), 0);
+    auto& device_type = storage_device_map_[expr][1];
+    auto call_dev_type = device_type[0]->value;
+    // Normal Relay Function
+    if (targets_.size() == 1) {
+      // homogeneous execution.
+      const auto& it = targets_.begin();
+      target = (*it).second;
+    } else {
+      // heterogeneous execution.
+      std::string call_dev_name;
+      if (call_dev_type == 0) {
+        call_dev_name = "llvm";
+      } else {
+        call_dev_name = runtime::DeviceName(call_dev_type);
+      }
+      if (targets_.count(call_dev_type) == 0) {
+        LOG(FATAL) << "No target is provided for device " << call_dev_name;
+      }
+      target = targets_[call_dev_type];
+    }
+    CCacheKey key = (*pf0)(func, target);
+    CachedFunc lowered_func = (*pf1)(compile_engine_, key);
+    if (!lowered_funcs_.count(target->str())) {
+      lowered_funcs_[target->str()] = IRModule(Map<GlobalVar, BaseFunc>({}));
+    }
+    lowered_funcs_[target->str()]->Update(lowered_func->funcs);
+
+    // Generate the TIR function call
+    func_call(GetRef<Call>(op), lowered_func->func_name);
+  }
+
+  void VisitExpr_(const VarNode* op) override {
+    Expr expr = GetRef<Expr>(op);
+
+    // If the Var node is an output node we need to copy the content of the variable to the output
+    // A Var node can only produce a single output
+    Array<IntegerArray> sids = storage_device_map_[expr];
+    auto output_iter = std::find(return_sid_.begin(), return_sid_.end(),
+                                 static_cast<int>((sids[0][0].as<IntImmNode>())->value));
+    if (output_iter != return_sid_.end()) {
+      int output_index = std::distance(return_sid_.begin(), output_iter);
+      auto var_expr = find_expr(expr);
+      copy_to_output(main_signature_[input_vars_.size() + output_index], var_expr[0], sids[2][0]);
+    }
+  }
+
+  void VisitExpr_(const ConstantNode* op) override {
+    Expr expr = GetRef<Expr>(op);
+    size_t index = params_.size();
+    std::string name = "p" + std::to_string(index);
+
+    param_storage_ids_[name] = storage_device_map_[expr][0][0]->value;
+    params_[name] = op->data;
+    reverse_params_lookup_.Set(expr, name);
+
+    // If the Constant node is an output node we need to copy the content of the parameter to the
+    // output A Var node can only produce a single output
+    Array<IntegerArray> sids = storage_device_map_[expr];
+    auto output_iter = std::find(return_sid_.begin(), return_sid_.end(),
+                                 static_cast<int>((sids[0][0].as<IntImmNode>())->value));
+    if (output_iter != return_sid_.end()) {
+      int output_index = std::distance(return_sid_.begin(), output_iter);
+      copy_to_output(main_signature_[input_vars_.size() + output_index], pack_param(expr),
+                     sids[2][0]);
+    }
+  }
+
+  void VisitExpr_(const TupleNode* op) override {
+    for (auto field : op->fields) {
+      VisitExpr(field);
+    }
+  }
+
+  void VisitExpr_(const LetNode* op) override {
+    // TODO(giuseros): support Let nodes in AOT
+    throw std::invalid_argument("Let not yet implemented in AOT");
+  }
+  void VisitExpr_(const TupleGetItemNode* op) override { VisitExpr(op->tuple); }
+  void VisitExpr_(const OpNode* op) override {
+    throw std::runtime_error("can not compile op in non-eta expanded form");
+  }
+  void VisitExpr_(const GlobalVarNode* op) override { throw std::runtime_error(""); }
+  void VisitExpr_(const IfNode* op) override { throw std::invalid_argument("if not supported"); }
+  void VisitExpr_(const FunctionNode* op) override {
+    ICHECK(op->GetAttr<String>(attr::kCompiler).defined())
+        << "Only functions supported by custom codegen";
+  }
+  void VisitExpr_(const RefCreateNode* op) override {
+    throw std::invalid_argument("reference not supported");
+  }
+  void VisitExpr_(const RefReadNode* op) override {
+    throw std::invalid_argument("reference not supported");
+  }
+  void VisitExpr_(const RefWriteNode* op) override {
+    throw std::invalid_argument("reference not supported");
+  }
+  void VisitExpr_(const ConstructorNode* op) override {
+    throw std::invalid_argument("ADT constructor case not yet implemented");
+  }
+  void VisitExpr_(const MatchNode* op) override {
+    throw std::invalid_argument("match case not yet implemented");
+  }
+
+  // Create the main PrimFunc to execute the graph
+  tir::PrimFunc CreateMainFunc(unsigned int relay_params) {
+    tir::Stmt body = tir::SeqStmt(stmts_);
+
+    // Allocate the sids
+    std::unordered_map<int, bool> allocated;
+
+    for (auto kv : storage_device_map_) {
+      // Only allocate sids that are needed
+      const bool is_input =
+          (std::find(input_vars_.begin(), input_vars_.end(), kv.first) != input_vars_.end());
+      const bool is_param = (reverse_params_lookup_.find(kv.first) != reverse_params_lookup_.end());
+      if (is_input || is_param) {
+        continue;
+      }
+
+      for (unsigned int i = 0; i < kv.second[0].size(); i++) {
+        int size = kv.second[2][i];
+        int sid = static_cast<int>((kv.second[0][i].as<IntImmNode>())->value);
+
+        if (std::find(return_sid_.begin(), return_sid_.end(), sid) != return_sid_.end()) {
+          continue;
+        }
+
+        if (!allocated[sid]) {
+          body = tir::LetStmt(sids_table_[sid], AllocateBackendMemory(size), body);
+        }
+        allocated[sid] = true;
+      }
+    }
+
+    // Define the attributes
+    body = tir::AttrStmt(PrimExpr(), tvm::tir::attr::device_type, 1, body);
+    body = tir::AttrStmt(PrimExpr(), tvm::tir::attr::device_id, 0, body);
+
+    // Make the PrimFunc
+    return tir::PrimFunc(main_signature_, body, VoidType(), Map<tir::Var, tir::Buffer>(),
+                         DictAttrs(dict_attrs_));
+  }
+
+ protected:
+  /*! \brief nodes */

Review comment:
       can you cleanup the docstrings and add one per var?

##########
File path: src/relay/backend/aot_codegen.cc
##########
@@ -0,0 +1,675 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+/*!
+ * \file relay/backend/graph_codegen.cc
+ * \brief Graph runtime codegen
+ */
+
+#include <dmlc/any.h>
+#include <tvm/ir/module.h>
+#include <tvm/relay/expr_functor.h>
+#include <tvm/runtime/device_api.h>
+#include <tvm/tir/builtin.h>
+#include <tvm/tir/expr.h>
+#include <tvm/tir/stmt.h>
+
+#include <algorithm>
+#include <list>
+#include <string>
+#include <vector>
+
+#include "../../runtime/meta_data.h"
+#include "compile_engine.h"
+#include "utils.h"
+
+namespace tvm {
+namespace relay {
+namespace backend {
+
+using IntegerArray = Array<Integer>;
+using ShapeVector = std::vector<std::vector<int64_t>>;
+using GraphAttrs = std::unordered_map<std::string, dmlc::any>;
+using TargetsMap = std::unordered_map<int, Target>;
+
+/*! \brief Lowered outputs */
+struct AOTLoweredOutput {
+  std::string graph_tir;
+  Map<String, IRModule> lowered_funcs;
+  Array<tvm::runtime::Module> external_mods;
+  std::unordered_map<std::string, std::pair<int, const tvm::runtime::NDArray>> params;
+  runtime::AOTMetadata aot_metadata;
+};
+
+class AotReturnSidVisitor : public ExprVisitor {
+ public:
+  explicit AotReturnSidVisitor(Map<Expr, Array<IntegerArray>> storage_device_map)
+      : storage_device_map_{storage_device_map}, return_sid_{-1} {}
+
+  IntegerArray FindReturnSid(Function func) {
+    VisitExpr(func->body);
+    return return_sid_;
+  }
+
+ protected:
+  void AssignReturnSid(Expr e) {
+    auto iter = storage_device_map_.find(e);
+    if (iter != storage_device_map_.end()) {
+      return_sid_ = (*iter).second[0];
+    }
+  }
+
+  void VisitExpr_(const ConstantNode* cn) override {
+    ExprVisitor::VisitExpr_(cn);
+    AssignReturnSid(GetRef<Expr>(cn));
+  }
+
+  void VisitExpr_(const VarNode* vn) override {
+    ExprVisitor::VisitExpr_(vn);
+    AssignReturnSid(GetRef<Expr>(vn));
+  }
+
+  void VisitExpr_(const CallNode* cn) override {
+    ExprVisitor::VisitExpr_(cn);
+    AssignReturnSid(GetRef<Expr>(cn));
+  }
+
+  void VisitExpr_(const LetNode* op) override { VisitExpr(op->body); }
+
+  void VisitExpr_(const TupleNode* tn) override {
+    ExprVisitor::VisitExpr_(tn);
+    AssignReturnSid(GetRef<Expr>(tn));
+  }
+
+ private:
+  Map<Expr, Array<IntegerArray>> storage_device_map_;
+  IntegerArray return_sid_;
+};
+
+using TIRNetwork = tvm::Array<tir::Stmt>;
+
+/*! \brief Code generator for graph runtime */
+class AOTCodegen : public ExprVisitor {
+ protected:
+  /*!
+   * \brief Utility function to allocate a DLTensor or TVMValue
+   * \param  type the type of allocation
+   * \param num the number of variable to allocate on the stack
+   * \return PrimExpr representing the allocated object
+   */
+  PrimExpr StackAlloca(std::string type, size_t num) {
+    Array<PrimExpr> args = {tir::StringImm(type), ConstInt32(num)};
+    return tir::Call(DataType::Handle(), tir::builtin::tvm_stack_alloca(), args);
+  }
+
+  /*!
+   * \brief Utility function to allocate memory for storage identifiers
+   * \param  memory_size_byte size in bytes of the allocation
+   * \return PrimExpr representing the allocated memory
+   */
+  PrimExpr AllocateBackendMemory(int memory_size_byte) {
+    // TODO(giuseros): use tir::Allocate instead of TVMBackendAllocWorkspace
+    // to enable unified memory planning
+    static const Op& op = Op::Get("tir.TVMBackendAllocWorkspace");
+    return tvm::tir::Call(DataType::Handle(), op, {1, 0, memory_size_byte, 2, 8});
+  }
+
+  /*!
+   * \brief Utility function to convert a concrete integer to a PrimExpr.
+   * \param num the number to convert
+   * \return PrimExpr representing num
+   */
+  inline PrimExpr ConstInt32(size_t num) {
+    ICHECK_LE(num, std::numeric_limits<int>::max());
+    return tir::make_const(DataType::Int(32), static_cast<int>(num));
+  }
+
+  /*!
+   * \brief Return a vector of variables that represents the sids for the given Relay Expr
+   */
+  std::vector<tir::Var> pack_sid(Expr expr) {
+    Array<IntegerArray> sids = storage_device_map_[expr];
+    std::vector<tir::Var> sid_vars;
+
+    // Note that an expression can have multiple sids associated with it
+    // e.g., returning multiple values from a function
+    for (const auto& sid : sids[0]) {
+      // Determine if an sid is an output buffer
+      int sid_int = static_cast<int>((sid.as<IntImmNode>())->value);
+      auto output_iter = std::find(return_sid_.begin(), return_sid_.end(), sid_int);
+      if (output_iter != return_sid_.end()) {
+        int output_index = std::distance(return_sid_.begin(), output_iter);
+        sid_vars.push_back(main_signature_[input_vars_.size() + output_index]);
+        continue;
+      }
+      // Pack the sid inside the TVMValue
+      auto sid_array = te::Var(make_string("sid_", sid, "_value"), DataType::Handle());
+      auto sid_value = sids_table_[sid];
+      tvm::PrimExpr set_tensor =
+          tvm::tir::Call(DataType::Handle(), tvm::tir::builtin::tvm_struct_set(),
+                         {sid_array, 0, tir::builtin::kArrData, sid_value});
+      stmts_.push_back(tir::LetStmt(sid_array, StackAlloca("array", 1), tir::Evaluate(set_tensor)));
+      sid_vars.push_back(sid_array);
+    }
+    return sid_vars;
+  }
+
+  /*!
+   * \brief Utility function to return a parameter associated with an expression
+   * \param expr Relay Expression assicated with the parameter
+   * \return Variable that represents the DLTensor associated with the parameters
+   */
+  tir::Var pack_param(Expr expr) {
+    // TODO(giuseros): Using call_extern to call into lookup_linked_param. This is because the
+    // builtin::ret is not supported yet in the c target. Once return is supported we can use

Review comment:
       @ZihengJiang can you look at this? I thought it was fixed?

##########
File path: src/runtime/crt/aot/tvm_executor.c
##########
@@ -0,0 +1,91 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+// LINT_C_FILE
+
+/*!
+ * \file src/runtime/crt/aot/tvm_executor.c
+ * \brief Internal implementation of the AOT Executor
+ */
+
+#include "tvm_executor.h"
+
+#include <dlpack/dlpack.h>
+
+#include "tvm_backend.h"
+#include "tvm_error.h"
+
+tvm_workspace_t* tvm_runtime_workspace;
+
+tvm_crt_error_t tvm_runtime_run(const tvm_model_t* model, void** inputs, void** outputs,
+                                tvm_context_t* context) {
+  static DLContext fake_ctx = {kDLCPU, 0};
+  static int64_t fake_dims = 0;
+  static int64_t fake_shape = {0};
+
+  DLTensor tensors[model->num_input_tensors + model->num_output_tensors];     // NOLINT
+  TVMValue tvm_values[model->num_input_tensors + model->num_output_tensors];  // NOLINT
+  int32_t tvm_typeids[model->num_input_tensors + model->num_output_tensors];  // NOLINT
+
+  for (int i = 0; i < model->num_input_tensors; i++) {
+    tensors[i] = (DLTensor){
+        .ctx = fake_ctx,
+        .data = inputs[i],
+        .shape = &fake_shape,
+        .ndim = fake_dims,
+        .byte_offset = 0,
+        .strides = NULL,
+    };
+    tvm_values[i].v_handle = &tensors[i];
+  }
+
+  for (int i = 0; i < model->num_output_tensors; i++) {
+    tensors[model->num_input_tensors + i] = (DLTensor){
+        .ctx = fake_ctx,
+        .data = outputs[i],
+        .shape = &fake_shape,
+        .ndim = fake_dims,
+        .byte_offset = 0,
+        .strides = NULL,
+    };
+    tvm_values[model->num_input_tensors + i].v_handle = &tensors[model->num_input_tensors + i];
+  }
+
+  return model->run_func(&tvm_values, &tvm_typeids, 0, NULL, 0, context);
+}
+
+void* TVMBackendAllocWorkspace(int device_type, int device_id, uint64_t nbytes, int dtype_code_hint,
+                               int dtype_bits_hint) {
+  uint32_t offset = (~nbytes + 1) & (TVM_RUNTIME_ALLOC_ALIGNMENT - 1);

Review comment:
       offset_bytes

##########
File path: include/tvm/runtime/crt/aot/tvm_backend.h
##########
@@ -0,0 +1,104 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+/*!
+ * \file include/tvm/runtime/crt/aot/tvm_backend.h
+ * \brief Backend functions for the AOT executor
+ *
+ * These are not designed to user-facing and may change without warning
+ */
+
+#ifndef TVM_RUNTIME_CRT_AOT_TVM_BACKEND_H_
+#define TVM_RUNTIME_CRT_AOT_TVM_BACKEND_H_
+
+#include <stddef.h>
+#include <stdint.h>
+
+#include "tvm_error.h"
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+/*! Memory alignment for allocator */
+#ifndef TVM_RUNTIME_ALLOC_ALIGNMENT
+#define TVM_RUNTIME_ALLOC_ALIGNMENT 16

Review comment:
       can you specify a unit, e.g. `TVM_RUNTIME_ALLOC_ALIGNMENT_BYTES`

##########
File path: src/target/source/source_module.cc
##########
@@ -191,17 +192,35 @@ class CSourceCrtMetadataModuleNode : public runtime::ModuleNode {
           << "}\n";
   }
 
+  void GenerateAOTDescriptor() {
+    code_ << "#include <tvm_executor.h>\n";
+    code_ << "#ifdef __cplusplus\n";
+    code_ << "extern \"C\"\n";
+    code_ << "#endif\n";
+    code_ << "TVM_DLL int32_t " << ::tvm::runtime::symbol::tvm_run_func_prefix;
+    code_ << "(void* args, void* type_code, int num_args, void* out_value, void* "
+             "out_type_code, void* resource_handle);\n";
+    code_ << "const tvm_model_t network = {\n"
+          << "    .run_func = &" << ::tvm::runtime::symbol::tvm_run_func_prefix << ",\n"
+          << "    .num_input_tensors = " << aot_metadata_->num_inputs << ",\n"
+          << "    .num_output_tensors = " << aot_metadata_->num_outputs << ", \n"
+          << "};\n";
+  }
+
   void CreateSource() {
     if (target_->GetAttr<Bool>("system-lib").value_or(Bool(false)) && !func_names_.empty()) {
       CreateFuncRegistry();
       GenerateCrtSystemLib();
     }
+    if (target_->GetAttr<String>("executor").value_or("graph_runtime") == "aot") {

Review comment:
       "graph" and we should probably add a constant for this to e.g. `include/tvm/target/target_kind.h` next to `kTvmRuntimeCrt`

##########
File path: src/target/source/codegen_source_base.h
##########
@@ -175,8 +176,8 @@ runtime::Module DeviceSourceModuleCreate(
  * \param target the target the modules are compiled for.

Review comment:
       update docstring

##########
File path: tests/python/relay/aot/aot_test.mk
##########
@@ -0,0 +1,71 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+# Makefile to build ethosu_test_runner
+# Setup build environment
+#
+AOT_ROOT ?= $(TVM_ROOT)/src/runtime/crt/aot

Review comment:
       would the idea be to put this behind the project API?

##########
File path: src/target/source/codegen_c_host.cc
##########
@@ -211,21 +219,34 @@ void CodeGenCHost::PrintGetFuncFromBackend(const std::string& func_name,
   this->stream << "}\n";
 }
 
-void CodeGenCHost::PrintFuncCall(const std::string& packed_func_name, int num_args) {
+void CodeGenCHost::PrintFuncCall(const std::string& packed_func_name, PrimExpr values,
+                                 int num_args) {
   this->PrintIndent();
+  std::string stack_value = "stack_value";
+  if (const VarNode* stack_value_var = values.as<VarNode>()) {
+    stack_value = stack_value_var->name_hint;
+  }
   std::string ret_val = GetUniqueName("ret_val");
   std::string ret_type_code = GetUniqueName("ret_type_code");
   this->stream << "TVMValue " << ret_val << ";\n";
   this->PrintIndent();
   this->stream << "int " << ret_type_code << ";\n";
   this->PrintIndent();
-  this->stream << "if (TVMFuncCall(" << packed_func_name << ", "
-               << "(TVMValue*) stack_value"
-               << ", "
+
+  if (is_aot_executor_) {
+    this->stream << "if (" << packed_func_name << "( "
+                 << "(TVMValue*) " << stack_value;
+  } else {
+    this->stream << "if (TVMFuncCall(" << packed_func_name << ", "
+                 << "(TVMValue*) stack_value";
+  }
+  this->stream << ", "
                << "(int*) stack_tcode"
                << ", " << num_args << ", "
-               << "&" << ret_val << ", "
-               << "&" << ret_type_code << ") != 0) {\n";
+               << "&" << ret_val << ", ";
+  this->stream << "&" << ret_type_code;
+  this->stream << (is_aot_executor_ ? ", NULL" : "") << ") != 0) {\n";

Review comment:
       why do you need this change?

##########
File path: cmake/modules/StandaloneCrt.cmake
##########
@@ -135,6 +137,7 @@ if(USE_MICRO)
     file(GLOB TEST_SRCS ${CMAKE_SOURCE_DIR}/tests/crt/*_test.cc)
     find_path(GTEST_INCLUDE_DIR gtest/gtest.h)
     find_library(GTEST_LIB gtest "$ENV{GTEST_LIB}")
+    set(aot_executor_src "${standalone_crt_base}/src/runtime/crt/aot/tvm_executor.c")

Review comment:
       why is this needed?

##########
File path: python/tvm/micro/model_library_format.py
##########
@@ -156,10 +170,11 @@ def export_model_library_format(mod: graph_executor_factory.GraphExecutorFactory
     with open(tempdir.relpath("relay.txt"), "w") as f:
         f.write(str(mod.ir_mod))
 
-    graph_config_dir_path = tempdir.relpath(os.path.join("runtime-config", "graph"))
-    os.makedirs(graph_config_dir_path)
-    with open(os.path.join(graph_config_dir_path, "graph.json"), "w") as f:
-        f.write(mod.graph_json)
+    if not is_aot:

Review comment:
       is it possible to create an alternate `runtime-config/aot` containing whatever configuration you needed?

##########
File path: src/relay/backend/aot_codegen.cc
##########
@@ -0,0 +1,675 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+/*!
+ * \file relay/backend/graph_codegen.cc
+ * \brief Graph runtime codegen
+ */
+
+#include <dmlc/any.h>
+#include <tvm/ir/module.h>
+#include <tvm/relay/expr_functor.h>
+#include <tvm/runtime/device_api.h>
+#include <tvm/tir/builtin.h>
+#include <tvm/tir/expr.h>
+#include <tvm/tir/stmt.h>
+
+#include <algorithm>
+#include <list>
+#include <string>
+#include <vector>
+
+#include "../../runtime/meta_data.h"
+#include "compile_engine.h"
+#include "utils.h"
+
+namespace tvm {
+namespace relay {
+namespace backend {
+
+using IntegerArray = Array<Integer>;
+using ShapeVector = std::vector<std::vector<int64_t>>;
+using GraphAttrs = std::unordered_map<std::string, dmlc::any>;
+using TargetsMap = std::unordered_map<int, Target>;
+
+/*! \brief Lowered outputs */
+struct AOTLoweredOutput {
+  std::string graph_tir;
+  Map<String, IRModule> lowered_funcs;
+  Array<tvm::runtime::Module> external_mods;
+  std::unordered_map<std::string, std::pair<int, const tvm::runtime::NDArray>> params;
+  runtime::AOTMetadata aot_metadata;
+};
+
+class AotReturnSidVisitor : public ExprVisitor {
+ public:
+  explicit AotReturnSidVisitor(Map<Expr, Array<IntegerArray>> storage_device_map)
+      : storage_device_map_{storage_device_map}, return_sid_{-1} {}
+
+  IntegerArray FindReturnSid(Function func) {
+    VisitExpr(func->body);
+    return return_sid_;
+  }
+
+ protected:
+  void AssignReturnSid(Expr e) {
+    auto iter = storage_device_map_.find(e);
+    if (iter != storage_device_map_.end()) {
+      return_sid_ = (*iter).second[0];
+    }
+  }
+
+  void VisitExpr_(const ConstantNode* cn) override {
+    ExprVisitor::VisitExpr_(cn);
+    AssignReturnSid(GetRef<Expr>(cn));
+  }
+
+  void VisitExpr_(const VarNode* vn) override {
+    ExprVisitor::VisitExpr_(vn);
+    AssignReturnSid(GetRef<Expr>(vn));
+  }
+
+  void VisitExpr_(const CallNode* cn) override {
+    ExprVisitor::VisitExpr_(cn);
+    AssignReturnSid(GetRef<Expr>(cn));
+  }
+
+  void VisitExpr_(const LetNode* op) override { VisitExpr(op->body); }
+
+  void VisitExpr_(const TupleNode* tn) override {
+    ExprVisitor::VisitExpr_(tn);
+    AssignReturnSid(GetRef<Expr>(tn));
+  }
+
+ private:
+  Map<Expr, Array<IntegerArray>> storage_device_map_;
+  IntegerArray return_sid_;
+};
+
+using TIRNetwork = tvm::Array<tir::Stmt>;
+
+/*! \brief Code generator for graph runtime */

Review comment:
       update

##########
File path: include/tvm/tir/builtin.h
##########
@@ -343,6 +343,10 @@ TVM_DLL const Op& tvm_stack_make_array();
  */
 TVM_DLL const Op& tvm_call_packed();
 
+// This achieve the same of a packed call, but with an extern call
+// directly to the operator
+TVM_DLL const Op& tvm_call_unpacked();

Review comment:
       needed?

##########
File path: src/relay/backend/aot_codegen.cc
##########
@@ -0,0 +1,675 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+/*!
+ * \file relay/backend/graph_codegen.cc
+ * \brief Graph runtime codegen
+ */
+
+#include <dmlc/any.h>
+#include <tvm/ir/module.h>
+#include <tvm/relay/expr_functor.h>
+#include <tvm/runtime/device_api.h>
+#include <tvm/tir/builtin.h>
+#include <tvm/tir/expr.h>
+#include <tvm/tir/stmt.h>
+
+#include <algorithm>
+#include <list>
+#include <string>
+#include <vector>
+
+#include "../../runtime/meta_data.h"
+#include "compile_engine.h"
+#include "utils.h"
+
+namespace tvm {
+namespace relay {
+namespace backend {
+
+using IntegerArray = Array<Integer>;
+using ShapeVector = std::vector<std::vector<int64_t>>;
+using GraphAttrs = std::unordered_map<std::string, dmlc::any>;
+using TargetsMap = std::unordered_map<int, Target>;
+
+/*! \brief Lowered outputs */
+struct AOTLoweredOutput {
+  std::string graph_tir;
+  Map<String, IRModule> lowered_funcs;
+  Array<tvm::runtime::Module> external_mods;
+  std::unordered_map<std::string, std::pair<int, const tvm::runtime::NDArray>> params;
+  runtime::AOTMetadata aot_metadata;
+};
+
+class AotReturnSidVisitor : public ExprVisitor {
+ public:
+  explicit AotReturnSidVisitor(Map<Expr, Array<IntegerArray>> storage_device_map)
+      : storage_device_map_{storage_device_map}, return_sid_{-1} {}
+
+  IntegerArray FindReturnSid(Function func) {
+    VisitExpr(func->body);
+    return return_sid_;
+  }
+
+ protected:
+  void AssignReturnSid(Expr e) {
+    auto iter = storage_device_map_.find(e);
+    if (iter != storage_device_map_.end()) {
+      return_sid_ = (*iter).second[0];
+    }
+  }
+
+  void VisitExpr_(const ConstantNode* cn) override {
+    ExprVisitor::VisitExpr_(cn);
+    AssignReturnSid(GetRef<Expr>(cn));
+  }
+
+  void VisitExpr_(const VarNode* vn) override {
+    ExprVisitor::VisitExpr_(vn);
+    AssignReturnSid(GetRef<Expr>(vn));
+  }
+
+  void VisitExpr_(const CallNode* cn) override {
+    ExprVisitor::VisitExpr_(cn);
+    AssignReturnSid(GetRef<Expr>(cn));
+  }
+
+  void VisitExpr_(const LetNode* op) override { VisitExpr(op->body); }
+
+  void VisitExpr_(const TupleNode* tn) override {
+    ExprVisitor::VisitExpr_(tn);
+    AssignReturnSid(GetRef<Expr>(tn));
+  }
+
+ private:
+  Map<Expr, Array<IntegerArray>> storage_device_map_;
+  IntegerArray return_sid_;
+};
+
+using TIRNetwork = tvm::Array<tir::Stmt>;
+
+/*! \brief Code generator for graph runtime */
+class AOTCodegen : public ExprVisitor {
+ protected:
+  /*!
+   * \brief Utility function to allocate a DLTensor or TVMValue
+   * \param  type the type of allocation
+   * \param num the number of variable to allocate on the stack
+   * \return PrimExpr representing the allocated object
+   */
+  PrimExpr StackAlloca(std::string type, size_t num) {
+    Array<PrimExpr> args = {tir::StringImm(type), ConstInt32(num)};
+    return tir::Call(DataType::Handle(), tir::builtin::tvm_stack_alloca(), args);
+  }
+
+  /*!
+   * \brief Utility function to allocate memory for storage identifiers
+   * \param  memory_size_byte size in bytes of the allocation
+   * \return PrimExpr representing the allocated memory
+   */
+  PrimExpr AllocateBackendMemory(int memory_size_byte) {
+    // TODO(giuseros): use tir::Allocate instead of TVMBackendAllocWorkspace
+    // to enable unified memory planning
+    static const Op& op = Op::Get("tir.TVMBackendAllocWorkspace");
+    return tvm::tir::Call(DataType::Handle(), op, {1, 0, memory_size_byte, 2, 8});
+  }
+
+  /*!
+   * \brief Utility function to convert a concrete integer to a PrimExpr.
+   * \param num the number to convert
+   * \return PrimExpr representing num
+   */
+  inline PrimExpr ConstInt32(size_t num) {
+    ICHECK_LE(num, std::numeric_limits<int>::max());
+    return tir::make_const(DataType::Int(32), static_cast<int>(num));
+  }
+
+  /*!
+   * \brief Return a vector of variables that represents the sids for the given Relay Expr
+   */
+  std::vector<tir::Var> pack_sid(Expr expr) {
+    Array<IntegerArray> sids = storage_device_map_[expr];
+    std::vector<tir::Var> sid_vars;
+
+    // Note that an expression can have multiple sids associated with it
+    // e.g., returning multiple values from a function
+    for (const auto& sid : sids[0]) {
+      // Determine if an sid is an output buffer
+      int sid_int = static_cast<int>((sid.as<IntImmNode>())->value);
+      auto output_iter = std::find(return_sid_.begin(), return_sid_.end(), sid_int);
+      if (output_iter != return_sid_.end()) {
+        int output_index = std::distance(return_sid_.begin(), output_iter);
+        sid_vars.push_back(main_signature_[input_vars_.size() + output_index]);
+        continue;
+      }
+      // Pack the sid inside the TVMValue
+      auto sid_array = te::Var(make_string("sid_", sid, "_value"), DataType::Handle());
+      auto sid_value = sids_table_[sid];
+      tvm::PrimExpr set_tensor =
+          tvm::tir::Call(DataType::Handle(), tvm::tir::builtin::tvm_struct_set(),
+                         {sid_array, 0, tir::builtin::kArrData, sid_value});
+      stmts_.push_back(tir::LetStmt(sid_array, StackAlloca("array", 1), tir::Evaluate(set_tensor)));
+      sid_vars.push_back(sid_array);
+    }
+    return sid_vars;
+  }
+
+  /*!
+   * \brief Utility function to return a parameter associated with an expression
+   * \param expr Relay Expression assicated with the parameter
+   * \return Variable that represents the DLTensor associated with the parameters
+   */
+  tir::Var pack_param(Expr expr) {
+    // TODO(giuseros): Using call_extern to call into lookup_linked_param. This is because the

Review comment:
       maybe we can checkin a v1 that uses lookup_linked_param and then continue to optimize from there? also, it might be the case that people want to override parameters sometimes, so it would be nice to avoid hardcoding it even if by default the lookup is e.g. handled as `.data` section initialization.

##########
File path: python/tvm/relay/backend/graph_executor_factory.py
##########
@@ -41,17 +41,18 @@ class GraphExecutorFactoryModule:
         The parameters of module
     """
 
-    def __init__(self, ir_mod, target, graph_json_str, libmod, libmod_name, params):

Review comment:
       is this change needed as a way to export metadata for the AOT runtime, since `tvm.relay.build`'s return value is fairly strict? what's this used for outside of `tvm.relay.build`?

##########
File path: src/relay/backend/aot_codegen.cc
##########
@@ -0,0 +1,675 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+/*!
+ * \file relay/backend/graph_codegen.cc
+ * \brief Graph runtime codegen
+ */
+
+#include <dmlc/any.h>
+#include <tvm/ir/module.h>
+#include <tvm/relay/expr_functor.h>
+#include <tvm/runtime/device_api.h>
+#include <tvm/tir/builtin.h>
+#include <tvm/tir/expr.h>
+#include <tvm/tir/stmt.h>
+
+#include <algorithm>
+#include <list>
+#include <string>
+#include <vector>
+
+#include "../../runtime/meta_data.h"
+#include "compile_engine.h"
+#include "utils.h"
+
+namespace tvm {
+namespace relay {
+namespace backend {
+
+using IntegerArray = Array<Integer>;
+using ShapeVector = std::vector<std::vector<int64_t>>;
+using GraphAttrs = std::unordered_map<std::string, dmlc::any>;
+using TargetsMap = std::unordered_map<int, Target>;
+
+/*! \brief Lowered outputs */
+struct AOTLoweredOutput {
+  std::string graph_tir;
+  Map<String, IRModule> lowered_funcs;
+  Array<tvm::runtime::Module> external_mods;
+  std::unordered_map<std::string, std::pair<int, const tvm::runtime::NDArray>> params;
+  runtime::AOTMetadata aot_metadata;
+};
+
+class AotReturnSidVisitor : public ExprVisitor {
+ public:
+  explicit AotReturnSidVisitor(Map<Expr, Array<IntegerArray>> storage_device_map)
+      : storage_device_map_{storage_device_map}, return_sid_{-1} {}
+
+  IntegerArray FindReturnSid(Function func) {
+    VisitExpr(func->body);
+    return return_sid_;
+  }
+
+ protected:
+  void AssignReturnSid(Expr e) {
+    auto iter = storage_device_map_.find(e);
+    if (iter != storage_device_map_.end()) {
+      return_sid_ = (*iter).second[0];
+    }
+  }
+
+  void VisitExpr_(const ConstantNode* cn) override {
+    ExprVisitor::VisitExpr_(cn);
+    AssignReturnSid(GetRef<Expr>(cn));
+  }
+
+  void VisitExpr_(const VarNode* vn) override {
+    ExprVisitor::VisitExpr_(vn);
+    AssignReturnSid(GetRef<Expr>(vn));
+  }
+
+  void VisitExpr_(const CallNode* cn) override {
+    ExprVisitor::VisitExpr_(cn);
+    AssignReturnSid(GetRef<Expr>(cn));
+  }
+
+  void VisitExpr_(const LetNode* op) override { VisitExpr(op->body); }
+
+  void VisitExpr_(const TupleNode* tn) override {
+    ExprVisitor::VisitExpr_(tn);
+    AssignReturnSid(GetRef<Expr>(tn));
+  }
+
+ private:
+  Map<Expr, Array<IntegerArray>> storage_device_map_;
+  IntegerArray return_sid_;
+};
+
+using TIRNetwork = tvm::Array<tir::Stmt>;
+
+/*! \brief Code generator for graph runtime */
+class AOTCodegen : public ExprVisitor {
+ protected:
+  /*!
+   * \brief Utility function to allocate a DLTensor or TVMValue
+   * \param  type the type of allocation
+   * \param num the number of variable to allocate on the stack
+   * \return PrimExpr representing the allocated object
+   */
+  PrimExpr StackAlloca(std::string type, size_t num) {
+    Array<PrimExpr> args = {tir::StringImm(type), ConstInt32(num)};
+    return tir::Call(DataType::Handle(), tir::builtin::tvm_stack_alloca(), args);
+  }
+
+  /*!
+   * \brief Utility function to allocate memory for storage identifiers
+   * \param  memory_size_byte size in bytes of the allocation
+   * \return PrimExpr representing the allocated memory
+   */
+  PrimExpr AllocateBackendMemory(int memory_size_byte) {
+    // TODO(giuseros): use tir::Allocate instead of TVMBackendAllocWorkspace
+    // to enable unified memory planning
+    static const Op& op = Op::Get("tir.TVMBackendAllocWorkspace");
+    return tvm::tir::Call(DataType::Handle(), op, {1, 0, memory_size_byte, 2, 8});
+  }
+
+  /*!
+   * \brief Utility function to convert a concrete integer to a PrimExpr.
+   * \param num the number to convert
+   * \return PrimExpr representing num
+   */
+  inline PrimExpr ConstInt32(size_t num) {
+    ICHECK_LE(num, std::numeric_limits<int>::max());
+    return tir::make_const(DataType::Int(32), static_cast<int>(num));
+  }
+
+  /*!
+   * \brief Return a vector of variables that represents the sids for the given Relay Expr
+   */
+  std::vector<tir::Var> pack_sid(Expr expr) {
+    Array<IntegerArray> sids = storage_device_map_[expr];
+    std::vector<tir::Var> sid_vars;
+
+    // Note that an expression can have multiple sids associated with it
+    // e.g., returning multiple values from a function
+    for (const auto& sid : sids[0]) {
+      // Determine if an sid is an output buffer
+      int sid_int = static_cast<int>((sid.as<IntImmNode>())->value);
+      auto output_iter = std::find(return_sid_.begin(), return_sid_.end(), sid_int);
+      if (output_iter != return_sid_.end()) {
+        int output_index = std::distance(return_sid_.begin(), output_iter);
+        sid_vars.push_back(main_signature_[input_vars_.size() + output_index]);
+        continue;
+      }
+      // Pack the sid inside the TVMValue
+      auto sid_array = te::Var(make_string("sid_", sid, "_value"), DataType::Handle());
+      auto sid_value = sids_table_[sid];
+      tvm::PrimExpr set_tensor =
+          tvm::tir::Call(DataType::Handle(), tvm::tir::builtin::tvm_struct_set(),
+                         {sid_array, 0, tir::builtin::kArrData, sid_value});
+      stmts_.push_back(tir::LetStmt(sid_array, StackAlloca("array", 1), tir::Evaluate(set_tensor)));
+      sid_vars.push_back(sid_array);
+    }
+    return sid_vars;
+  }
+
+  /*!
+   * \brief Utility function to return a parameter associated with an expression
+   * \param expr Relay Expression assicated with the parameter
+   * \return Variable that represents the DLTensor associated with the parameters
+   */
+  tir::Var pack_param(Expr expr) {
+    // TODO(giuseros): Using call_extern to call into lookup_linked_param. This is because the
+    // builtin::ret is not supported yet in the c target. Once return is supported we can use
+    // tvm_call_packed_lowered().
+    int param_sid = param_storage_ids_[reverse_params_lookup_[expr]];
+    auto lookup_linked_param_fn = tir::StringImm(::tvm::runtime::symbol::tvm_lookup_linked_param);
+    auto param_array = te::Var(make_string("param_", param_sid, "_array"), DataType::Handle());
+
+    // Compose the lookup_call using a local stack
+    Array<tir::Stmt> lookup_call;
+    auto param_var = te::Var(make_string("param_", param_sid, "_value"), DataType::Handle());
+    auto ret_var = te::Var("ret_value", DataType::Handle());
+    auto ret_code = te::Var("ret_value", DataType::Handle());
+
+    lookup_call.push_back(tir::Evaluate(
+        tvm::tir::Call(DataType::Handle(), tvm::tir::builtin::tvm_struct_set(),
+                       {param_var, 0, tir::builtin::kTVMValueContent, ConstInt32(param_sid)})));
+    lookup_call.push_back(tir::Evaluate(
+        tvm::tir::Call(DataType::Handle(), tir::builtin::call_extern(),
+                       {lookup_linked_param_fn, param_var, 0, 0, ret_var, ret_code, 0})));
+    auto ret_var_handle = tvm::tir::Call(DataType::Handle(), tvm::tir::builtin::tvm_struct_get(),
+                                         {ret_var, 0, tir::builtin::kTVMValueContent});
+
+    // Set the param to the value returned by lookup_call
+    tvm::PrimExpr set_param_array =
+        tvm::tir::Call(DataType::Handle(), tvm::tir::builtin::tvm_struct_set(),
+                       {param_array, 0, tir::builtin::kArrData, ret_var_handle});
+    lookup_call.push_back(tir::Evaluate(set_param_array));
+
+    tir::Stmt lookup_body = tir::SeqStmt(lookup_call);
+
+    // Allocate the DLTensors on the stack
+    lookup_body = tir::LetStmt(param_var, StackAlloca("arg_value", 1), lookup_body);
+    lookup_body = tir::LetStmt(ret_var, StackAlloca("arg_value", 1), lookup_body);
+    lookup_body = tir::LetStmt(ret_code, StackAlloca("arg_value", 1), lookup_body);
+    lookup_body = tir::LetStmt(param_array, StackAlloca("arg_value", 1), lookup_body);
+    stmts_.push_back(lookup_body);
+    return param_array;
+  }
+
+  /*!
+   * brief Given an expression return the variable(s) associated with that expression
+   */
+  std::vector<te::Var> find_expr(Expr arg) {
+    auto input_iter = std::find(input_vars_.begin(), input_vars_.end(), arg);
+    if (input_iter != input_vars_.end()) {
+      // Input variable
+      int main_index = std::distance(input_vars_.begin(), input_iter);
+      return {main_signature_[main_index]};
+    } else if (reverse_params_lookup_.find(arg) != reverse_params_lookup_.end()) {
+      // Parameter of the network
+      return {pack_param(arg)};
+    } else {
+      // Storage identifier (i.e., intermediate memory)
+      return pack_sid(arg);
+    }
+  }
+
+  /*!
+   * brief Call a function with a given name
+   */
+  void func_call(Call call, std::string func_name) {
+    tvm::Array<PrimExpr> args{tvm::tir::StringImm(func_name)};
+    std::vector<tir::Stmt> func_call_stmts;
+
+    // Pack the inputs
+    for (Expr arg : call->args) {
+      auto var_arg = find_expr(arg);
+      args.push_back(var_arg[0]);
+    }
+
+    auto ret_expr = Downcast<Expr>(call);
+
+    // Pack the return(s) value. A call node can produce multiple outputs
+    for (const auto& var : pack_sid(ret_expr)) {
+      args.push_back(var);
+    }
+
+    // Use tvm_call_packed to execute the function
+    func_call_stmts.push_back(tir::Evaluate(
+        tvm::tir::Call(DataType::Int(32), tvm::tir::builtin::tvm_call_packed(), args)));
+    tir::Stmt body = tir::SeqStmt(func_call_stmts);
+    stmts_.push_back(body);
+  }
+
+  /*!
+   * brief Copy a variable to the output. This function is mainly used in edge cases
+   * when we want to return an input or a parameter.
+   */
+  void copy_to_output(te::Var out, te::Var in, size_t size) {
+    auto retval_get = tvm::tir::Call(DataType::Handle(), tvm::tir::builtin::tvm_struct_get(),
+                                     {in, 0, tir::builtin::kArrData});
+
+    // Define intermediate DLTensor to load/store the data
+    auto tmp0 = te::Var("tmp0", DataType::Handle());
+    auto tmp1 = te::Var("tmp1", DataType::Handle());
+    te::Var loop_idx("i", DataType::Int(32));
+    auto retval_i = tir::Load(DataType::UInt(8), tmp0, loop_idx, tir::const_true());
+    auto tostore = tvm::tir::Call(DataType::Handle(), tvm::tir::builtin::tvm_struct_get(),
+                                  {out, 0, tir::builtin::kArrData});
+
+    // Copy the variable from the input to the output
+    tir::Stmt copy = tir::For(
+        loop_idx, 0, ConstInt32(size), tir::ForKind::kSerial,
+        tir::Store(tmp1, tir::Let(tmp0, retval_get, retval_i), loop_idx, tir::const_true()));
+    stmts_.push_back(tir::LetStmt(tmp1, tostore, copy));
+  }
+
+  /*!
+   * Utility function to string together different arguments
+   */
+  template <typename... Args>
+  std::string make_string(Args const&... args) {
+    std::ostringstream ss;
+    using List = int[];
+    (void)List{0, ((void)(ss << args), 0)...};
+
+    return ss.str();
+  }
+
+  void VisitExpr_(const CallNode* op) override {
+    // Descend the call tree
+    for (auto arg : op->args) {
+      VisitExpr(arg);
+    }
+
+    Expr expr = GetRef<Expr>(op);
+    Function func;
+    if (op->op.as<OpNode>()) {
+      LOG(FATAL) << "Operators should be transformed away; try applying"
+                 << "the fuse_ops transformation to the expression.";
+    } else if (op->op.as<GlobalVarNode>()) {
+      LOG(FATAL) << "Not implemented";
+    } else if (op->op.as<FunctionNode>()) {
+      func = GetRef<Function>(op->op.as<FunctionNode>());
+    } else {
+      LOG(FATAL) << "TVM runtime does not support calls to " << op->op->GetTypeKey();
+    }
+    if (!func->HasNonzeroAttr(attr::kPrimitive)) {
+      LOG(FATAL) << "TVM only support calls to primitive functions "
+                 << "(i.e functions composed of fusable operator invocations)";
+    }
+
+    auto pf0 = GetPackedFunc("relay.backend._make_CCacheKey");
+    auto pf1 = GetPackedFunc("relay.backend._CompileEngineLower");
+    Target target;
+    // Handle external function
+    if (func->GetAttr<String>(attr::kCompiler).defined()) {
+      target = Target("ext_dev");
+      CCacheKey key = (*pf0)(func, target);
+      CachedFunc ext_func = (*pf1)(compile_engine_, key);
+      ICHECK(ext_func.defined()) << "External function is not defined.";
+      UpdateConstants(func, &params_);
+
+      // Generate the TIR function call
+      func_call(GetRef<Call>(op), ext_func->func_name);
+    }
+
+    ICHECK_GE(storage_device_map_.count(expr), 0);
+    auto& device_type = storage_device_map_[expr][1];
+    auto call_dev_type = device_type[0]->value;
+    // Normal Relay Function
+    if (targets_.size() == 1) {
+      // homogeneous execution.
+      const auto& it = targets_.begin();
+      target = (*it).second;
+    } else {
+      // heterogeneous execution.
+      std::string call_dev_name;
+      if (call_dev_type == 0) {
+        call_dev_name = "llvm";
+      } else {
+        call_dev_name = runtime::DeviceName(call_dev_type);
+      }
+      if (targets_.count(call_dev_type) == 0) {
+        LOG(FATAL) << "No target is provided for device " << call_dev_name;
+      }
+      target = targets_[call_dev_type];
+    }
+    CCacheKey key = (*pf0)(func, target);
+    CachedFunc lowered_func = (*pf1)(compile_engine_, key);
+    if (!lowered_funcs_.count(target->str())) {
+      lowered_funcs_[target->str()] = IRModule(Map<GlobalVar, BaseFunc>({}));
+    }
+    lowered_funcs_[target->str()]->Update(lowered_func->funcs);
+
+    // Generate the TIR function call
+    func_call(GetRef<Call>(op), lowered_func->func_name);
+  }
+
+  void VisitExpr_(const VarNode* op) override {
+    Expr expr = GetRef<Expr>(op);
+
+    // If the Var node is an output node we need to copy the content of the variable to the output
+    // A Var node can only produce a single output
+    Array<IntegerArray> sids = storage_device_map_[expr];
+    auto output_iter = std::find(return_sid_.begin(), return_sid_.end(),
+                                 static_cast<int>((sids[0][0].as<IntImmNode>())->value));
+    if (output_iter != return_sid_.end()) {
+      int output_index = std::distance(return_sid_.begin(), output_iter);
+      auto var_expr = find_expr(expr);
+      copy_to_output(main_signature_[input_vars_.size() + output_index], var_expr[0], sids[2][0]);
+    }
+  }
+
+  void VisitExpr_(const ConstantNode* op) override {
+    Expr expr = GetRef<Expr>(op);
+    size_t index = params_.size();
+    std::string name = "p" + std::to_string(index);
+
+    param_storage_ids_[name] = storage_device_map_[expr][0][0]->value;
+    params_[name] = op->data;
+    reverse_params_lookup_.Set(expr, name);
+
+    // If the Constant node is an output node we need to copy the content of the parameter to the
+    // output A Var node can only produce a single output
+    Array<IntegerArray> sids = storage_device_map_[expr];
+    auto output_iter = std::find(return_sid_.begin(), return_sid_.end(),
+                                 static_cast<int>((sids[0][0].as<IntImmNode>())->value));
+    if (output_iter != return_sid_.end()) {
+      int output_index = std::distance(return_sid_.begin(), output_iter);
+      copy_to_output(main_signature_[input_vars_.size() + output_index], pack_param(expr),
+                     sids[2][0]);
+    }
+  }
+
+  void VisitExpr_(const TupleNode* op) override {
+    for (auto field : op->fields) {
+      VisitExpr(field);
+    }
+  }
+
+  void VisitExpr_(const LetNode* op) override {
+    // TODO(giuseros): support Let nodes in AOT
+    throw std::invalid_argument("Let not yet implemented in AOT");

Review comment:
       I think would be best to CHECK here

##########
File path: src/relay/backend/aot_codegen.cc
##########
@@ -0,0 +1,675 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+/*!
+ * \file relay/backend/graph_codegen.cc
+ * \brief Graph runtime codegen
+ */
+
+#include <dmlc/any.h>
+#include <tvm/ir/module.h>
+#include <tvm/relay/expr_functor.h>
+#include <tvm/runtime/device_api.h>
+#include <tvm/tir/builtin.h>
+#include <tvm/tir/expr.h>
+#include <tvm/tir/stmt.h>
+
+#include <algorithm>
+#include <list>
+#include <string>
+#include <vector>
+
+#include "../../runtime/meta_data.h"
+#include "compile_engine.h"
+#include "utils.h"
+
+namespace tvm {
+namespace relay {
+namespace backend {
+
+using IntegerArray = Array<Integer>;
+using ShapeVector = std::vector<std::vector<int64_t>>;
+using GraphAttrs = std::unordered_map<std::string, dmlc::any>;
+using TargetsMap = std::unordered_map<int, Target>;
+
+/*! \brief Lowered outputs */
+struct AOTLoweredOutput {
+  std::string graph_tir;
+  Map<String, IRModule> lowered_funcs;
+  Array<tvm::runtime::Module> external_mods;
+  std::unordered_map<std::string, std::pair<int, const tvm::runtime::NDArray>> params;
+  runtime::AOTMetadata aot_metadata;
+};
+
+class AotReturnSidVisitor : public ExprVisitor {
+ public:
+  explicit AotReturnSidVisitor(Map<Expr, Array<IntegerArray>> storage_device_map)
+      : storage_device_map_{storage_device_map}, return_sid_{-1} {}
+
+  IntegerArray FindReturnSid(Function func) {
+    VisitExpr(func->body);
+    return return_sid_;
+  }
+
+ protected:
+  void AssignReturnSid(Expr e) {
+    auto iter = storage_device_map_.find(e);
+    if (iter != storage_device_map_.end()) {
+      return_sid_ = (*iter).second[0];
+    }
+  }
+
+  void VisitExpr_(const ConstantNode* cn) override {
+    ExprVisitor::VisitExpr_(cn);
+    AssignReturnSid(GetRef<Expr>(cn));
+  }
+
+  void VisitExpr_(const VarNode* vn) override {
+    ExprVisitor::VisitExpr_(vn);
+    AssignReturnSid(GetRef<Expr>(vn));
+  }
+
+  void VisitExpr_(const CallNode* cn) override {
+    ExprVisitor::VisitExpr_(cn);
+    AssignReturnSid(GetRef<Expr>(cn));
+  }
+
+  void VisitExpr_(const LetNode* op) override { VisitExpr(op->body); }
+
+  void VisitExpr_(const TupleNode* tn) override {
+    ExprVisitor::VisitExpr_(tn);
+    AssignReturnSid(GetRef<Expr>(tn));
+  }
+
+ private:
+  Map<Expr, Array<IntegerArray>> storage_device_map_;
+  IntegerArray return_sid_;
+};
+
+using TIRNetwork = tvm::Array<tir::Stmt>;
+
+/*! \brief Code generator for graph runtime */
+class AOTCodegen : public ExprVisitor {
+ protected:
+  /*!
+   * \brief Utility function to allocate a DLTensor or TVMValue
+   * \param  type the type of allocation
+   * \param num the number of variable to allocate on the stack
+   * \return PrimExpr representing the allocated object
+   */
+  PrimExpr StackAlloca(std::string type, size_t num) {
+    Array<PrimExpr> args = {tir::StringImm(type), ConstInt32(num)};
+    return tir::Call(DataType::Handle(), tir::builtin::tvm_stack_alloca(), args);
+  }
+
+  /*!
+   * \brief Utility function to allocate memory for storage identifiers
+   * \param  memory_size_byte size in bytes of the allocation
+   * \return PrimExpr representing the allocated memory
+   */
+  PrimExpr AllocateBackendMemory(int memory_size_byte) {
+    // TODO(giuseros): use tir::Allocate instead of TVMBackendAllocWorkspace
+    // to enable unified memory planning
+    static const Op& op = Op::Get("tir.TVMBackendAllocWorkspace");
+    return tvm::tir::Call(DataType::Handle(), op, {1, 0, memory_size_byte, 2, 8});
+  }
+
+  /*!
+   * \brief Utility function to convert a concrete integer to a PrimExpr.
+   * \param num the number to convert
+   * \return PrimExpr representing num
+   */
+  inline PrimExpr ConstInt32(size_t num) {
+    ICHECK_LE(num, std::numeric_limits<int>::max());
+    return tir::make_const(DataType::Int(32), static_cast<int>(num));
+  }
+
+  /*!
+   * \brief Return a vector of variables that represents the sids for the given Relay Expr
+   */
+  std::vector<tir::Var> pack_sid(Expr expr) {
+    Array<IntegerArray> sids = storage_device_map_[expr];
+    std::vector<tir::Var> sid_vars;
+
+    // Note that an expression can have multiple sids associated with it
+    // e.g., returning multiple values from a function
+    for (const auto& sid : sids[0]) {
+      // Determine if an sid is an output buffer
+      int sid_int = static_cast<int>((sid.as<IntImmNode>())->value);
+      auto output_iter = std::find(return_sid_.begin(), return_sid_.end(), sid_int);
+      if (output_iter != return_sid_.end()) {
+        int output_index = std::distance(return_sid_.begin(), output_iter);
+        sid_vars.push_back(main_signature_[input_vars_.size() + output_index]);
+        continue;
+      }
+      // Pack the sid inside the TVMValue
+      auto sid_array = te::Var(make_string("sid_", sid, "_value"), DataType::Handle());
+      auto sid_value = sids_table_[sid];
+      tvm::PrimExpr set_tensor =
+          tvm::tir::Call(DataType::Handle(), tvm::tir::builtin::tvm_struct_set(),
+                         {sid_array, 0, tir::builtin::kArrData, sid_value});
+      stmts_.push_back(tir::LetStmt(sid_array, StackAlloca("array", 1), tir::Evaluate(set_tensor)));
+      sid_vars.push_back(sid_array);
+    }
+    return sid_vars;
+  }
+
+  /*!
+   * \brief Utility function to return a parameter associated with an expression
+   * \param expr Relay Expression assicated with the parameter
+   * \return Variable that represents the DLTensor associated with the parameters
+   */
+  tir::Var pack_param(Expr expr) {
+    // TODO(giuseros): Using call_extern to call into lookup_linked_param. This is because the
+    // builtin::ret is not supported yet in the c target. Once return is supported we can use
+    // tvm_call_packed_lowered().
+    int param_sid = param_storage_ids_[reverse_params_lookup_[expr]];
+    auto lookup_linked_param_fn = tir::StringImm(::tvm::runtime::symbol::tvm_lookup_linked_param);
+    auto param_array = te::Var(make_string("param_", param_sid, "_array"), DataType::Handle());
+
+    // Compose the lookup_call using a local stack
+    Array<tir::Stmt> lookup_call;
+    auto param_var = te::Var(make_string("param_", param_sid, "_value"), DataType::Handle());
+    auto ret_var = te::Var("ret_value", DataType::Handle());
+    auto ret_code = te::Var("ret_value", DataType::Handle());
+
+    lookup_call.push_back(tir::Evaluate(
+        tvm::tir::Call(DataType::Handle(), tvm::tir::builtin::tvm_struct_set(),
+                       {param_var, 0, tir::builtin::kTVMValueContent, ConstInt32(param_sid)})));
+    lookup_call.push_back(tir::Evaluate(
+        tvm::tir::Call(DataType::Handle(), tir::builtin::call_extern(),
+                       {lookup_linked_param_fn, param_var, 0, 0, ret_var, ret_code, 0})));
+    auto ret_var_handle = tvm::tir::Call(DataType::Handle(), tvm::tir::builtin::tvm_struct_get(),
+                                         {ret_var, 0, tir::builtin::kTVMValueContent});
+
+    // Set the param to the value returned by lookup_call
+    tvm::PrimExpr set_param_array =
+        tvm::tir::Call(DataType::Handle(), tvm::tir::builtin::tvm_struct_set(),
+                       {param_array, 0, tir::builtin::kArrData, ret_var_handle});
+    lookup_call.push_back(tir::Evaluate(set_param_array));
+
+    tir::Stmt lookup_body = tir::SeqStmt(lookup_call);
+
+    // Allocate the DLTensors on the stack
+    lookup_body = tir::LetStmt(param_var, StackAlloca("arg_value", 1), lookup_body);
+    lookup_body = tir::LetStmt(ret_var, StackAlloca("arg_value", 1), lookup_body);
+    lookup_body = tir::LetStmt(ret_code, StackAlloca("arg_value", 1), lookup_body);
+    lookup_body = tir::LetStmt(param_array, StackAlloca("arg_value", 1), lookup_body);
+    stmts_.push_back(lookup_body);
+    return param_array;
+  }
+
+  /*!
+   * brief Given an expression return the variable(s) associated with that expression
+   */
+  std::vector<te::Var> find_expr(Expr arg) {
+    auto input_iter = std::find(input_vars_.begin(), input_vars_.end(), arg);
+    if (input_iter != input_vars_.end()) {
+      // Input variable
+      int main_index = std::distance(input_vars_.begin(), input_iter);
+      return {main_signature_[main_index]};
+    } else if (reverse_params_lookup_.find(arg) != reverse_params_lookup_.end()) {
+      // Parameter of the network
+      return {pack_param(arg)};
+    } else {
+      // Storage identifier (i.e., intermediate memory)
+      return pack_sid(arg);
+    }
+  }
+
+  /*!
+   * brief Call a function with a given name
+   */
+  void func_call(Call call, std::string func_name) {
+    tvm::Array<PrimExpr> args{tvm::tir::StringImm(func_name)};
+    std::vector<tir::Stmt> func_call_stmts;
+
+    // Pack the inputs
+    for (Expr arg : call->args) {
+      auto var_arg = find_expr(arg);
+      args.push_back(var_arg[0]);
+    }
+
+    auto ret_expr = Downcast<Expr>(call);
+
+    // Pack the return(s) value. A call node can produce multiple outputs
+    for (const auto& var : pack_sid(ret_expr)) {
+      args.push_back(var);
+    }
+
+    // Use tvm_call_packed to execute the function
+    func_call_stmts.push_back(tir::Evaluate(
+        tvm::tir::Call(DataType::Int(32), tvm::tir::builtin::tvm_call_packed(), args)));
+    tir::Stmt body = tir::SeqStmt(func_call_stmts);
+    stmts_.push_back(body);
+  }
+
+  /*!
+   * brief Copy a variable to the output. This function is mainly used in edge cases
+   * when we want to return an input or a parameter.
+   */
+  void copy_to_output(te::Var out, te::Var in, size_t size) {
+    auto retval_get = tvm::tir::Call(DataType::Handle(), tvm::tir::builtin::tvm_struct_get(),
+                                     {in, 0, tir::builtin::kArrData});
+
+    // Define intermediate DLTensor to load/store the data
+    auto tmp0 = te::Var("tmp0", DataType::Handle());
+    auto tmp1 = te::Var("tmp1", DataType::Handle());
+    te::Var loop_idx("i", DataType::Int(32));
+    auto retval_i = tir::Load(DataType::UInt(8), tmp0, loop_idx, tir::const_true());
+    auto tostore = tvm::tir::Call(DataType::Handle(), tvm::tir::builtin::tvm_struct_get(),
+                                  {out, 0, tir::builtin::kArrData});
+
+    // Copy the variable from the input to the output
+    tir::Stmt copy = tir::For(
+        loop_idx, 0, ConstInt32(size), tir::ForKind::kSerial,
+        tir::Store(tmp1, tir::Let(tmp0, retval_get, retval_i), loop_idx, tir::const_true()));
+    stmts_.push_back(tir::LetStmt(tmp1, tostore, copy));
+  }
+
+  /*!
+   * Utility function to string together different arguments
+   */
+  template <typename... Args>
+  std::string make_string(Args const&... args) {
+    std::ostringstream ss;
+    using List = int[];
+    (void)List{0, ((void)(ss << args), 0)...};
+
+    return ss.str();
+  }
+
+  void VisitExpr_(const CallNode* op) override {
+    // Descend the call tree
+    for (auto arg : op->args) {
+      VisitExpr(arg);
+    }
+
+    Expr expr = GetRef<Expr>(op);
+    Function func;
+    if (op->op.as<OpNode>()) {
+      LOG(FATAL) << "Operators should be transformed away; try applying"
+                 << "the fuse_ops transformation to the expression.";
+    } else if (op->op.as<GlobalVarNode>()) {
+      LOG(FATAL) << "Not implemented";
+    } else if (op->op.as<FunctionNode>()) {
+      func = GetRef<Function>(op->op.as<FunctionNode>());
+    } else {
+      LOG(FATAL) << "TVM runtime does not support calls to " << op->op->GetTypeKey();
+    }
+    if (!func->HasNonzeroAttr(attr::kPrimitive)) {
+      LOG(FATAL) << "TVM only support calls to primitive functions "
+                 << "(i.e functions composed of fusable operator invocations)";
+    }
+
+    auto pf0 = GetPackedFunc("relay.backend._make_CCacheKey");
+    auto pf1 = GetPackedFunc("relay.backend._CompileEngineLower");
+    Target target;
+    // Handle external function
+    if (func->GetAttr<String>(attr::kCompiler).defined()) {
+      target = Target("ext_dev");
+      CCacheKey key = (*pf0)(func, target);
+      CachedFunc ext_func = (*pf1)(compile_engine_, key);
+      ICHECK(ext_func.defined()) << "External function is not defined.";
+      UpdateConstants(func, &params_);
+
+      // Generate the TIR function call
+      func_call(GetRef<Call>(op), ext_func->func_name);
+    }
+
+    ICHECK_GE(storage_device_map_.count(expr), 0);
+    auto& device_type = storage_device_map_[expr][1];
+    auto call_dev_type = device_type[0]->value;
+    // Normal Relay Function
+    if (targets_.size() == 1) {
+      // homogeneous execution.
+      const auto& it = targets_.begin();
+      target = (*it).second;
+    } else {
+      // heterogeneous execution.
+      std::string call_dev_name;
+      if (call_dev_type == 0) {
+        call_dev_name = "llvm";
+      } else {
+        call_dev_name = runtime::DeviceName(call_dev_type);
+      }
+      if (targets_.count(call_dev_type) == 0) {
+        LOG(FATAL) << "No target is provided for device " << call_dev_name;
+      }
+      target = targets_[call_dev_type];
+    }
+    CCacheKey key = (*pf0)(func, target);
+    CachedFunc lowered_func = (*pf1)(compile_engine_, key);
+    if (!lowered_funcs_.count(target->str())) {
+      lowered_funcs_[target->str()] = IRModule(Map<GlobalVar, BaseFunc>({}));
+    }
+    lowered_funcs_[target->str()]->Update(lowered_func->funcs);
+
+    // Generate the TIR function call
+    func_call(GetRef<Call>(op), lowered_func->func_name);
+  }
+
+  void VisitExpr_(const VarNode* op) override {
+    Expr expr = GetRef<Expr>(op);
+
+    // If the Var node is an output node we need to copy the content of the variable to the output
+    // A Var node can only produce a single output

Review comment:
       would maybe add something to the effect of "it's safe to check the SID here because Var StorageToken are never reallocated"

##########
File path: include/tvm/runtime/crt/aot/tvm_executor.h
##########
@@ -0,0 +1,97 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+/*!
+ * \file include/tvm/runtime/crt/aot/tvm_executor.h
+ * \brief TVM Executor for the Ahead-of-Time Runtime
+ *
+ * AOT models are described by the TVM model descriptor format
+ * which can be passed to tvm_runtime_run. These descriptors will be
+ * generated by the AOT compilation process. This can optionally be
+ * augmented with platform specific context to be passed to the TVM
+ * operators.
+ *
+ * Example:
+ * extern tvm_model_t my_network;
+ * int main() {
+ *    void* data = get_data();
+ *    void* output[4] = {0, 0, 0, 0};
+ *    void* inputs = {data};
+ *    void* outputs = {output};
+ *    tvm_context_t my_context = {
+ *      .driver = ...;
+ *    };
+ *    tvm_runtime_run(
+ *      &my_network,
+ *      inputs,
+ *      outputs
+ *      &my_context
+ *    );
+ *    return 0;
+ * }
+ */
+
+#ifndef TVM_RUNTIME_CRT_AOT_TVM_EXECUTOR_H_
+#define TVM_RUNTIME_CRT_AOT_TVM_EXECUTOR_H_
+
+#include <stdint.h>
+
+#include "tvm_backend.h"
+#include "tvm_error.h"
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+/*!

Review comment:
       I think it would be great to leave out speculative structures like this one for the v1 (but it's great to see the direction you're thinking of going here).

##########
File path: src/relay/backend/aot_codegen.cc
##########
@@ -0,0 +1,675 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+/*!
+ * \file relay/backend/graph_codegen.cc
+ * \brief Graph runtime codegen
+ */
+
+#include <dmlc/any.h>
+#include <tvm/ir/module.h>
+#include <tvm/relay/expr_functor.h>
+#include <tvm/runtime/device_api.h>
+#include <tvm/tir/builtin.h>
+#include <tvm/tir/expr.h>
+#include <tvm/tir/stmt.h>
+
+#include <algorithm>
+#include <list>
+#include <string>
+#include <vector>
+
+#include "../../runtime/meta_data.h"
+#include "compile_engine.h"
+#include "utils.h"
+
+namespace tvm {
+namespace relay {
+namespace backend {
+
+using IntegerArray = Array<Integer>;
+using ShapeVector = std::vector<std::vector<int64_t>>;
+using GraphAttrs = std::unordered_map<std::string, dmlc::any>;
+using TargetsMap = std::unordered_map<int, Target>;
+
+/*! \brief Lowered outputs */
+struct AOTLoweredOutput {
+  std::string graph_tir;
+  Map<String, IRModule> lowered_funcs;
+  Array<tvm::runtime::Module> external_mods;
+  std::unordered_map<std::string, std::pair<int, const tvm::runtime::NDArray>> params;
+  runtime::AOTMetadata aot_metadata;
+};
+
+class AotReturnSidVisitor : public ExprVisitor {
+ public:
+  explicit AotReturnSidVisitor(Map<Expr, Array<IntegerArray>> storage_device_map)
+      : storage_device_map_{storage_device_map}, return_sid_{-1} {}
+
+  IntegerArray FindReturnSid(Function func) {
+    VisitExpr(func->body);
+    return return_sid_;
+  }
+
+ protected:
+  void AssignReturnSid(Expr e) {
+    auto iter = storage_device_map_.find(e);
+    if (iter != storage_device_map_.end()) {
+      return_sid_ = (*iter).second[0];
+    }
+  }
+
+  void VisitExpr_(const ConstantNode* cn) override {
+    ExprVisitor::VisitExpr_(cn);
+    AssignReturnSid(GetRef<Expr>(cn));
+  }
+
+  void VisitExpr_(const VarNode* vn) override {
+    ExprVisitor::VisitExpr_(vn);
+    AssignReturnSid(GetRef<Expr>(vn));
+  }
+
+  void VisitExpr_(const CallNode* cn) override {
+    ExprVisitor::VisitExpr_(cn);
+    AssignReturnSid(GetRef<Expr>(cn));
+  }
+
+  void VisitExpr_(const LetNode* op) override { VisitExpr(op->body); }
+
+  void VisitExpr_(const TupleNode* tn) override {
+    ExprVisitor::VisitExpr_(tn);
+    AssignReturnSid(GetRef<Expr>(tn));
+  }
+
+ private:
+  Map<Expr, Array<IntegerArray>> storage_device_map_;
+  IntegerArray return_sid_;
+};
+
+using TIRNetwork = tvm::Array<tir::Stmt>;
+
+/*! \brief Code generator for graph runtime */
+class AOTCodegen : public ExprVisitor {
+ protected:
+  /*!
+   * \brief Utility function to allocate a DLTensor or TVMValue
+   * \param  type the type of allocation
+   * \param num the number of variable to allocate on the stack
+   * \return PrimExpr representing the allocated object
+   */
+  PrimExpr StackAlloca(std::string type, size_t num) {
+    Array<PrimExpr> args = {tir::StringImm(type), ConstInt32(num)};
+    return tir::Call(DataType::Handle(), tir::builtin::tvm_stack_alloca(), args);
+  }
+
+  /*!
+   * \brief Utility function to allocate memory for storage identifiers
+   * \param  memory_size_byte size in bytes of the allocation
+   * \return PrimExpr representing the allocated memory
+   */
+  PrimExpr AllocateBackendMemory(int memory_size_byte) {
+    // TODO(giuseros): use tir::Allocate instead of TVMBackendAllocWorkspace
+    // to enable unified memory planning
+    static const Op& op = Op::Get("tir.TVMBackendAllocWorkspace");
+    return tvm::tir::Call(DataType::Handle(), op, {1, 0, memory_size_byte, 2, 8});
+  }
+
+  /*!
+   * \brief Utility function to convert a concrete integer to a PrimExpr.
+   * \param num the number to convert
+   * \return PrimExpr representing num
+   */
+  inline PrimExpr ConstInt32(size_t num) {
+    ICHECK_LE(num, std::numeric_limits<int>::max());
+    return tir::make_const(DataType::Int(32), static_cast<int>(num));
+  }
+
+  /*!
+   * \brief Return a vector of variables that represents the sids for the given Relay Expr
+   */
+  std::vector<tir::Var> pack_sid(Expr expr) {
+    Array<IntegerArray> sids = storage_device_map_[expr];
+    std::vector<tir::Var> sid_vars;
+
+    // Note that an expression can have multiple sids associated with it
+    // e.g., returning multiple values from a function
+    for (const auto& sid : sids[0]) {
+      // Determine if an sid is an output buffer
+      int sid_int = static_cast<int>((sid.as<IntImmNode>())->value);
+      auto output_iter = std::find(return_sid_.begin(), return_sid_.end(), sid_int);
+      if (output_iter != return_sid_.end()) {
+        int output_index = std::distance(return_sid_.begin(), output_iter);
+        sid_vars.push_back(main_signature_[input_vars_.size() + output_index]);
+        continue;
+      }
+      // Pack the sid inside the TVMValue
+      auto sid_array = te::Var(make_string("sid_", sid, "_value"), DataType::Handle());
+      auto sid_value = sids_table_[sid];
+      tvm::PrimExpr set_tensor =
+          tvm::tir::Call(DataType::Handle(), tvm::tir::builtin::tvm_struct_set(),
+                         {sid_array, 0, tir::builtin::kArrData, sid_value});
+      stmts_.push_back(tir::LetStmt(sid_array, StackAlloca("array", 1), tir::Evaluate(set_tensor)));
+      sid_vars.push_back(sid_array);
+    }
+    return sid_vars;
+  }
+
+  /*!
+   * \brief Utility function to return a parameter associated with an expression
+   * \param expr Relay Expression assicated with the parameter
+   * \return Variable that represents the DLTensor associated with the parameters
+   */
+  tir::Var pack_param(Expr expr) {
+    // TODO(giuseros): Using call_extern to call into lookup_linked_param. This is because the
+    // builtin::ret is not supported yet in the c target. Once return is supported we can use
+    // tvm_call_packed_lowered().
+    int param_sid = param_storage_ids_[reverse_params_lookup_[expr]];
+    auto lookup_linked_param_fn = tir::StringImm(::tvm::runtime::symbol::tvm_lookup_linked_param);
+    auto param_array = te::Var(make_string("param_", param_sid, "_array"), DataType::Handle());
+
+    // Compose the lookup_call using a local stack
+    Array<tir::Stmt> lookup_call;
+    auto param_var = te::Var(make_string("param_", param_sid, "_value"), DataType::Handle());
+    auto ret_var = te::Var("ret_value", DataType::Handle());
+    auto ret_code = te::Var("ret_value", DataType::Handle());
+
+    lookup_call.push_back(tir::Evaluate(
+        tvm::tir::Call(DataType::Handle(), tvm::tir::builtin::tvm_struct_set(),
+                       {param_var, 0, tir::builtin::kTVMValueContent, ConstInt32(param_sid)})));
+    lookup_call.push_back(tir::Evaluate(
+        tvm::tir::Call(DataType::Handle(), tir::builtin::call_extern(),
+                       {lookup_linked_param_fn, param_var, 0, 0, ret_var, ret_code, 0})));
+    auto ret_var_handle = tvm::tir::Call(DataType::Handle(), tvm::tir::builtin::tvm_struct_get(),
+                                         {ret_var, 0, tir::builtin::kTVMValueContent});
+
+    // Set the param to the value returned by lookup_call
+    tvm::PrimExpr set_param_array =
+        tvm::tir::Call(DataType::Handle(), tvm::tir::builtin::tvm_struct_set(),
+                       {param_array, 0, tir::builtin::kArrData, ret_var_handle});
+    lookup_call.push_back(tir::Evaluate(set_param_array));
+
+    tir::Stmt lookup_body = tir::SeqStmt(lookup_call);
+
+    // Allocate the DLTensors on the stack
+    lookup_body = tir::LetStmt(param_var, StackAlloca("arg_value", 1), lookup_body);
+    lookup_body = tir::LetStmt(ret_var, StackAlloca("arg_value", 1), lookup_body);
+    lookup_body = tir::LetStmt(ret_code, StackAlloca("arg_value", 1), lookup_body);
+    lookup_body = tir::LetStmt(param_array, StackAlloca("arg_value", 1), lookup_body);
+    stmts_.push_back(lookup_body);
+    return param_array;
+  }
+
+  /*!
+   * brief Given an expression return the variable(s) associated with that expression
+   */
+  std::vector<te::Var> find_expr(Expr arg) {
+    auto input_iter = std::find(input_vars_.begin(), input_vars_.end(), arg);
+    if (input_iter != input_vars_.end()) {
+      // Input variable
+      int main_index = std::distance(input_vars_.begin(), input_iter);
+      return {main_signature_[main_index]};
+    } else if (reverse_params_lookup_.find(arg) != reverse_params_lookup_.end()) {
+      // Parameter of the network
+      return {pack_param(arg)};
+    } else {
+      // Storage identifier (i.e., intermediate memory)
+      return pack_sid(arg);
+    }
+  }
+
+  /*!
+   * brief Call a function with a given name
+   */
+  void func_call(Call call, std::string func_name) {
+    tvm::Array<PrimExpr> args{tvm::tir::StringImm(func_name)};
+    std::vector<tir::Stmt> func_call_stmts;
+
+    // Pack the inputs
+    for (Expr arg : call->args) {
+      auto var_arg = find_expr(arg);
+      args.push_back(var_arg[0]);
+    }
+
+    auto ret_expr = Downcast<Expr>(call);
+
+    // Pack the return(s) value. A call node can produce multiple outputs
+    for (const auto& var : pack_sid(ret_expr)) {
+      args.push_back(var);
+    }
+
+    // Use tvm_call_packed to execute the function
+    func_call_stmts.push_back(tir::Evaluate(
+        tvm::tir::Call(DataType::Int(32), tvm::tir::builtin::tvm_call_packed(), args)));
+    tir::Stmt body = tir::SeqStmt(func_call_stmts);
+    stmts_.push_back(body);
+  }
+
+  /*!
+   * brief Copy a variable to the output. This function is mainly used in edge cases

Review comment:
       just a note: seems like maybe we need to improve the firmware-facing API so that we could just return a DLTensor instance, whether it was a dedicated output instance or a DLTensor pointing at the input. what are your thoughts?
   
   @tqchen 

##########
File path: src/runtime/crt/aot/tvm_executor.c
##########
@@ -0,0 +1,91 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+// LINT_C_FILE
+
+/*!
+ * \file src/runtime/crt/aot/tvm_executor.c
+ * \brief Internal implementation of the AOT Executor
+ */
+
+#include "tvm_executor.h"
+
+#include <dlpack/dlpack.h>
+
+#include "tvm_backend.h"
+#include "tvm_error.h"
+
+tvm_workspace_t* tvm_runtime_workspace;
+
+tvm_crt_error_t tvm_runtime_run(const tvm_model_t* model, void** inputs, void** outputs,
+                                tvm_context_t* context) {
+  static DLContext fake_ctx = {kDLCPU, 0};

Review comment:
       I think you need to update this to take into account the DLDevice refactor

##########
File path: src/runtime/crt/aot/tvm_executor.c
##########
@@ -0,0 +1,91 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+// LINT_C_FILE
+
+/*!
+ * \file src/runtime/crt/aot/tvm_executor.c
+ * \brief Internal implementation of the AOT Executor
+ */
+
+#include "tvm_executor.h"
+
+#include <dlpack/dlpack.h>
+
+#include "tvm_backend.h"
+#include "tvm_error.h"
+
+tvm_workspace_t* tvm_runtime_workspace;
+
+tvm_crt_error_t tvm_runtime_run(const tvm_model_t* model, void** inputs, void** outputs,
+                                tvm_context_t* context) {
+  static DLContext fake_ctx = {kDLCPU, 0};
+  static int64_t fake_dims = 0;
+  static int64_t fake_shape = {0};
+
+  DLTensor tensors[model->num_input_tensors + model->num_output_tensors];     // NOLINT

Review comment:
       why NOLINT?

##########
File path: src/relay/backend/aot_codegen.cc
##########
@@ -0,0 +1,675 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+/*!
+ * \file relay/backend/graph_codegen.cc
+ * \brief Graph runtime codegen
+ */
+
+#include <dmlc/any.h>
+#include <tvm/ir/module.h>
+#include <tvm/relay/expr_functor.h>
+#include <tvm/runtime/device_api.h>
+#include <tvm/tir/builtin.h>
+#include <tvm/tir/expr.h>
+#include <tvm/tir/stmt.h>
+
+#include <algorithm>
+#include <list>
+#include <string>
+#include <vector>
+
+#include "../../runtime/meta_data.h"
+#include "compile_engine.h"
+#include "utils.h"
+
+namespace tvm {
+namespace relay {
+namespace backend {
+
+using IntegerArray = Array<Integer>;
+using ShapeVector = std::vector<std::vector<int64_t>>;
+using GraphAttrs = std::unordered_map<std::string, dmlc::any>;
+using TargetsMap = std::unordered_map<int, Target>;
+
+/*! \brief Lowered outputs */
+struct AOTLoweredOutput {
+  std::string graph_tir;
+  Map<String, IRModule> lowered_funcs;
+  Array<tvm::runtime::Module> external_mods;
+  std::unordered_map<std::string, std::pair<int, const tvm::runtime::NDArray>> params;
+  runtime::AOTMetadata aot_metadata;
+};
+
+class AotReturnSidVisitor : public ExprVisitor {
+ public:
+  explicit AotReturnSidVisitor(Map<Expr, Array<IntegerArray>> storage_device_map)
+      : storage_device_map_{storage_device_map}, return_sid_{-1} {}
+
+  IntegerArray FindReturnSid(Function func) {
+    VisitExpr(func->body);
+    return return_sid_;
+  }
+
+ protected:
+  void AssignReturnSid(Expr e) {
+    auto iter = storage_device_map_.find(e);
+    if (iter != storage_device_map_.end()) {
+      return_sid_ = (*iter).second[0];
+    }
+  }
+
+  void VisitExpr_(const ConstantNode* cn) override {
+    ExprVisitor::VisitExpr_(cn);
+    AssignReturnSid(GetRef<Expr>(cn));
+  }
+
+  void VisitExpr_(const VarNode* vn) override {
+    ExprVisitor::VisitExpr_(vn);
+    AssignReturnSid(GetRef<Expr>(vn));
+  }
+
+  void VisitExpr_(const CallNode* cn) override {
+    ExprVisitor::VisitExpr_(cn);
+    AssignReturnSid(GetRef<Expr>(cn));
+  }
+
+  void VisitExpr_(const LetNode* op) override { VisitExpr(op->body); }
+
+  void VisitExpr_(const TupleNode* tn) override {
+    ExprVisitor::VisitExpr_(tn);
+    AssignReturnSid(GetRef<Expr>(tn));
+  }
+
+ private:
+  Map<Expr, Array<IntegerArray>> storage_device_map_;
+  IntegerArray return_sid_;
+};
+
+using TIRNetwork = tvm::Array<tir::Stmt>;
+
+/*! \brief Code generator for graph runtime */
+class AOTCodegen : public ExprVisitor {
+ protected:
+  /*!
+   * \brief Utility function to allocate a DLTensor or TVMValue
+   * \param  type the type of allocation
+   * \param num the number of variable to allocate on the stack
+   * \return PrimExpr representing the allocated object
+   */
+  PrimExpr StackAlloca(std::string type, size_t num) {
+    Array<PrimExpr> args = {tir::StringImm(type), ConstInt32(num)};
+    return tir::Call(DataType::Handle(), tir::builtin::tvm_stack_alloca(), args);
+  }
+
+  /*!
+   * \brief Utility function to allocate memory for storage identifiers
+   * \param  memory_size_byte size in bytes of the allocation
+   * \return PrimExpr representing the allocated memory
+   */
+  PrimExpr AllocateBackendMemory(int memory_size_byte) {
+    // TODO(giuseros): use tir::Allocate instead of TVMBackendAllocWorkspace
+    // to enable unified memory planning
+    static const Op& op = Op::Get("tir.TVMBackendAllocWorkspace");
+    return tvm::tir::Call(DataType::Handle(), op, {1, 0, memory_size_byte, 2, 8});
+  }
+
+  /*!
+   * \brief Utility function to convert a concrete integer to a PrimExpr.
+   * \param num the number to convert
+   * \return PrimExpr representing num
+   */
+  inline PrimExpr ConstInt32(size_t num) {
+    ICHECK_LE(num, std::numeric_limits<int>::max());
+    return tir::make_const(DataType::Int(32), static_cast<int>(num));
+  }
+
+  /*!
+   * \brief Return a vector of variables that represents the sids for the given Relay Expr
+   */
+  std::vector<tir::Var> pack_sid(Expr expr) {
+    Array<IntegerArray> sids = storage_device_map_[expr];
+    std::vector<tir::Var> sid_vars;
+
+    // Note that an expression can have multiple sids associated with it
+    // e.g., returning multiple values from a function
+    for (const auto& sid : sids[0]) {
+      // Determine if an sid is an output buffer
+      int sid_int = static_cast<int>((sid.as<IntImmNode>())->value);
+      auto output_iter = std::find(return_sid_.begin(), return_sid_.end(), sid_int);
+      if (output_iter != return_sid_.end()) {
+        int output_index = std::distance(return_sid_.begin(), output_iter);
+        sid_vars.push_back(main_signature_[input_vars_.size() + output_index]);
+        continue;
+      }
+      // Pack the sid inside the TVMValue
+      auto sid_array = te::Var(make_string("sid_", sid, "_value"), DataType::Handle());
+      auto sid_value = sids_table_[sid];
+      tvm::PrimExpr set_tensor =
+          tvm::tir::Call(DataType::Handle(), tvm::tir::builtin::tvm_struct_set(),
+                         {sid_array, 0, tir::builtin::kArrData, sid_value});
+      stmts_.push_back(tir::LetStmt(sid_array, StackAlloca("array", 1), tir::Evaluate(set_tensor)));
+      sid_vars.push_back(sid_array);
+    }
+    return sid_vars;
+  }
+
+  /*!
+   * \brief Utility function to return a parameter associated with an expression
+   * \param expr Relay Expression assicated with the parameter
+   * \return Variable that represents the DLTensor associated with the parameters
+   */
+  tir::Var pack_param(Expr expr) {
+    // TODO(giuseros): Using call_extern to call into lookup_linked_param. This is because the
+    // builtin::ret is not supported yet in the c target. Once return is supported we can use
+    // tvm_call_packed_lowered().
+    int param_sid = param_storage_ids_[reverse_params_lookup_[expr]];
+    auto lookup_linked_param_fn = tir::StringImm(::tvm::runtime::symbol::tvm_lookup_linked_param);
+    auto param_array = te::Var(make_string("param_", param_sid, "_array"), DataType::Handle());
+
+    // Compose the lookup_call using a local stack
+    Array<tir::Stmt> lookup_call;
+    auto param_var = te::Var(make_string("param_", param_sid, "_value"), DataType::Handle());
+    auto ret_var = te::Var("ret_value", DataType::Handle());
+    auto ret_code = te::Var("ret_value", DataType::Handle());
+
+    lookup_call.push_back(tir::Evaluate(
+        tvm::tir::Call(DataType::Handle(), tvm::tir::builtin::tvm_struct_set(),
+                       {param_var, 0, tir::builtin::kTVMValueContent, ConstInt32(param_sid)})));
+    lookup_call.push_back(tir::Evaluate(
+        tvm::tir::Call(DataType::Handle(), tir::builtin::call_extern(),
+                       {lookup_linked_param_fn, param_var, 0, 0, ret_var, ret_code, 0})));
+    auto ret_var_handle = tvm::tir::Call(DataType::Handle(), tvm::tir::builtin::tvm_struct_get(),
+                                         {ret_var, 0, tir::builtin::kTVMValueContent});
+
+    // Set the param to the value returned by lookup_call
+    tvm::PrimExpr set_param_array =
+        tvm::tir::Call(DataType::Handle(), tvm::tir::builtin::tvm_struct_set(),
+                       {param_array, 0, tir::builtin::kArrData, ret_var_handle});
+    lookup_call.push_back(tir::Evaluate(set_param_array));
+
+    tir::Stmt lookup_body = tir::SeqStmt(lookup_call);
+
+    // Allocate the DLTensors on the stack
+    lookup_body = tir::LetStmt(param_var, StackAlloca("arg_value", 1), lookup_body);
+    lookup_body = tir::LetStmt(ret_var, StackAlloca("arg_value", 1), lookup_body);
+    lookup_body = tir::LetStmt(ret_code, StackAlloca("arg_value", 1), lookup_body);
+    lookup_body = tir::LetStmt(param_array, StackAlloca("arg_value", 1), lookup_body);
+    stmts_.push_back(lookup_body);
+    return param_array;
+  }
+
+  /*!
+   * brief Given an expression return the variable(s) associated with that expression
+   */
+  std::vector<te::Var> find_expr(Expr arg) {
+    auto input_iter = std::find(input_vars_.begin(), input_vars_.end(), arg);
+    if (input_iter != input_vars_.end()) {
+      // Input variable
+      int main_index = std::distance(input_vars_.begin(), input_iter);
+      return {main_signature_[main_index]};
+    } else if (reverse_params_lookup_.find(arg) != reverse_params_lookup_.end()) {
+      // Parameter of the network
+      return {pack_param(arg)};
+    } else {
+      // Storage identifier (i.e., intermediate memory)
+      return pack_sid(arg);
+    }
+  }
+
+  /*!
+   * brief Call a function with a given name
+   */
+  void func_call(Call call, std::string func_name) {
+    tvm::Array<PrimExpr> args{tvm::tir::StringImm(func_name)};
+    std::vector<tir::Stmt> func_call_stmts;
+
+    // Pack the inputs
+    for (Expr arg : call->args) {
+      auto var_arg = find_expr(arg);
+      args.push_back(var_arg[0]);
+    }
+
+    auto ret_expr = Downcast<Expr>(call);
+
+    // Pack the return(s) value. A call node can produce multiple outputs
+    for (const auto& var : pack_sid(ret_expr)) {
+      args.push_back(var);
+    }
+
+    // Use tvm_call_packed to execute the function
+    func_call_stmts.push_back(tir::Evaluate(
+        tvm::tir::Call(DataType::Int(32), tvm::tir::builtin::tvm_call_packed(), args)));
+    tir::Stmt body = tir::SeqStmt(func_call_stmts);
+    stmts_.push_back(body);
+  }
+
+  /*!
+   * brief Copy a variable to the output. This function is mainly used in edge cases
+   * when we want to return an input or a parameter.
+   */
+  void copy_to_output(te::Var out, te::Var in, size_t size) {
+    auto retval_get = tvm::tir::Call(DataType::Handle(), tvm::tir::builtin::tvm_struct_get(),
+                                     {in, 0, tir::builtin::kArrData});
+
+    // Define intermediate DLTensor to load/store the data
+    auto tmp0 = te::Var("tmp0", DataType::Handle());
+    auto tmp1 = te::Var("tmp1", DataType::Handle());
+    te::Var loop_idx("i", DataType::Int(32));
+    auto retval_i = tir::Load(DataType::UInt(8), tmp0, loop_idx, tir::const_true());
+    auto tostore = tvm::tir::Call(DataType::Handle(), tvm::tir::builtin::tvm_struct_get(),
+                                  {out, 0, tir::builtin::kArrData});
+
+    // Copy the variable from the input to the output
+    tir::Stmt copy = tir::For(
+        loop_idx, 0, ConstInt32(size), tir::ForKind::kSerial,
+        tir::Store(tmp1, tir::Let(tmp0, retval_get, retval_i), loop_idx, tir::const_true()));
+    stmts_.push_back(tir::LetStmt(tmp1, tostore, copy));
+  }
+
+  /*!
+   * Utility function to string together different arguments
+   */
+  template <typename... Args>
+  std::string make_string(Args const&... args) {
+    std::ostringstream ss;
+    using List = int[];
+    (void)List{0, ((void)(ss << args), 0)...};
+
+    return ss.str();
+  }
+
+  void VisitExpr_(const CallNode* op) override {
+    // Descend the call tree
+    for (auto arg : op->args) {
+      VisitExpr(arg);
+    }
+
+    Expr expr = GetRef<Expr>(op);
+    Function func;
+    if (op->op.as<OpNode>()) {
+      LOG(FATAL) << "Operators should be transformed away; try applying"
+                 << "the fuse_ops transformation to the expression.";
+    } else if (op->op.as<GlobalVarNode>()) {
+      LOG(FATAL) << "Not implemented";
+    } else if (op->op.as<FunctionNode>()) {
+      func = GetRef<Function>(op->op.as<FunctionNode>());
+    } else {
+      LOG(FATAL) << "TVM runtime does not support calls to " << op->op->GetTypeKey();
+    }
+    if (!func->HasNonzeroAttr(attr::kPrimitive)) {
+      LOG(FATAL) << "TVM only support calls to primitive functions "
+                 << "(i.e functions composed of fusable operator invocations)";
+    }
+
+    auto pf0 = GetPackedFunc("relay.backend._make_CCacheKey");
+    auto pf1 = GetPackedFunc("relay.backend._CompileEngineLower");
+    Target target;
+    // Handle external function
+    if (func->GetAttr<String>(attr::kCompiler).defined()) {
+      target = Target("ext_dev");
+      CCacheKey key = (*pf0)(func, target);
+      CachedFunc ext_func = (*pf1)(compile_engine_, key);
+      ICHECK(ext_func.defined()) << "External function is not defined.";
+      UpdateConstants(func, &params_);
+
+      // Generate the TIR function call
+      func_call(GetRef<Call>(op), ext_func->func_name);
+    }
+
+    ICHECK_GE(storage_device_map_.count(expr), 0);
+    auto& device_type = storage_device_map_[expr][1];
+    auto call_dev_type = device_type[0]->value;
+    // Normal Relay Function
+    if (targets_.size() == 1) {
+      // homogeneous execution.
+      const auto& it = targets_.begin();
+      target = (*it).second;
+    } else {
+      // heterogeneous execution.
+      std::string call_dev_name;
+      if (call_dev_type == 0) {
+        call_dev_name = "llvm";
+      } else {
+        call_dev_name = runtime::DeviceName(call_dev_type);
+      }
+      if (targets_.count(call_dev_type) == 0) {
+        LOG(FATAL) << "No target is provided for device " << call_dev_name;
+      }
+      target = targets_[call_dev_type];
+    }
+    CCacheKey key = (*pf0)(func, target);
+    CachedFunc lowered_func = (*pf1)(compile_engine_, key);
+    if (!lowered_funcs_.count(target->str())) {
+      lowered_funcs_[target->str()] = IRModule(Map<GlobalVar, BaseFunc>({}));
+    }
+    lowered_funcs_[target->str()]->Update(lowered_func->funcs);
+
+    // Generate the TIR function call
+    func_call(GetRef<Call>(op), lowered_func->func_name);
+  }
+
+  void VisitExpr_(const VarNode* op) override {
+    Expr expr = GetRef<Expr>(op);
+
+    // If the Var node is an output node we need to copy the content of the variable to the output
+    // A Var node can only produce a single output
+    Array<IntegerArray> sids = storage_device_map_[expr];
+    auto output_iter = std::find(return_sid_.begin(), return_sid_.end(),
+                                 static_cast<int>((sids[0][0].as<IntImmNode>())->value));
+    if (output_iter != return_sid_.end()) {
+      int output_index = std::distance(return_sid_.begin(), output_iter);
+      auto var_expr = find_expr(expr);
+      copy_to_output(main_signature_[input_vars_.size() + output_index], var_expr[0], sids[2][0]);
+    }
+  }
+
+  void VisitExpr_(const ConstantNode* op) override {
+    Expr expr = GetRef<Expr>(op);
+    size_t index = params_.size();
+    std::string name = "p" + std::to_string(index);
+
+    param_storage_ids_[name] = storage_device_map_[expr][0][0]->value;
+    params_[name] = op->data;
+    reverse_params_lookup_.Set(expr, name);
+
+    // If the Constant node is an output node we need to copy the content of the parameter to the
+    // output A Var node can only produce a single output
+    Array<IntegerArray> sids = storage_device_map_[expr];
+    auto output_iter = std::find(return_sid_.begin(), return_sid_.end(),
+                                 static_cast<int>((sids[0][0].as<IntImmNode>())->value));
+    if (output_iter != return_sid_.end()) {
+      int output_index = std::distance(return_sid_.begin(), output_iter);
+      copy_to_output(main_signature_[input_vars_.size() + output_index], pack_param(expr),
+                     sids[2][0]);
+    }
+  }
+
+  void VisitExpr_(const TupleNode* op) override {
+    for (auto field : op->fields) {
+      VisitExpr(field);
+    }
+  }
+
+  void VisitExpr_(const LetNode* op) override {
+    // TODO(giuseros): support Let nodes in AOT
+    throw std::invalid_argument("Let not yet implemented in AOT");
+  }
+  void VisitExpr_(const TupleGetItemNode* op) override { VisitExpr(op->tuple); }
+  void VisitExpr_(const OpNode* op) override {
+    throw std::runtime_error("can not compile op in non-eta expanded form");
+  }
+  void VisitExpr_(const GlobalVarNode* op) override { throw std::runtime_error(""); }
+  void VisitExpr_(const IfNode* op) override { throw std::invalid_argument("if not supported"); }
+  void VisitExpr_(const FunctionNode* op) override {
+    ICHECK(op->GetAttr<String>(attr::kCompiler).defined())
+        << "Only functions supported by custom codegen";
+  }
+  void VisitExpr_(const RefCreateNode* op) override {
+    throw std::invalid_argument("reference not supported");
+  }
+  void VisitExpr_(const RefReadNode* op) override {
+    throw std::invalid_argument("reference not supported");
+  }
+  void VisitExpr_(const RefWriteNode* op) override {
+    throw std::invalid_argument("reference not supported");
+  }
+  void VisitExpr_(const ConstructorNode* op) override {
+    throw std::invalid_argument("ADT constructor case not yet implemented");
+  }
+  void VisitExpr_(const MatchNode* op) override {
+    throw std::invalid_argument("match case not yet implemented");
+  }
+
+  // Create the main PrimFunc to execute the graph
+  tir::PrimFunc CreateMainFunc(unsigned int relay_params) {
+    tir::Stmt body = tir::SeqStmt(stmts_);
+
+    // Allocate the sids
+    std::unordered_map<int, bool> allocated;
+
+    for (auto kv : storage_device_map_) {
+      // Only allocate sids that are needed
+      const bool is_input =
+          (std::find(input_vars_.begin(), input_vars_.end(), kv.first) != input_vars_.end());
+      const bool is_param = (reverse_params_lookup_.find(kv.first) != reverse_params_lookup_.end());
+      if (is_input || is_param) {
+        continue;
+      }
+
+      for (unsigned int i = 0; i < kv.second[0].size(); i++) {
+        int size = kv.second[2][i];
+        int sid = static_cast<int>((kv.second[0][i].as<IntImmNode>())->value);
+
+        if (std::find(return_sid_.begin(), return_sid_.end(), sid) != return_sid_.end()) {
+          continue;
+        }
+
+        if (!allocated[sid]) {
+          body = tir::LetStmt(sids_table_[sid], AllocateBackendMemory(size), body);

Review comment:
       this is totally fine for now, but I think it would be good to do this outside of the "run" function, so that multiple inferences didn't have to pay the cost of re-allocating memory.

##########
File path: src/target/source/codegen_c_host.cc
##########
@@ -40,13 +40,21 @@ namespace codegen {
 
 CodeGenCHost::CodeGenCHost() { module_name_ = GetUniqueName("__tvm_module_ctx"); }
 
-void CodeGenCHost::Init(bool output_ssa, bool emit_asserts, std::string target_str) {
+void CodeGenCHost::Init(bool output_ssa, bool emit_asserts, bool is_aot_executor,
+                        std::string target_str) {
   emit_asserts_ = emit_asserts;
+  is_aot_executor_ = is_aot_executor;
   declared_globals_.clear();
   decl_stream << "// tvm target: " << target_str << "\n";
   decl_stream << "#define TVM_EXPORTS\n";
-  decl_stream << "#include \"tvm/runtime/c_runtime_api.h\"\n";
-  decl_stream << "#include \"tvm/runtime/c_backend_api.h\"\n";
+  if (is_aot_executor) {

Review comment:
       rm, if you're ok w/ unifying the runtimes

##########
File path: src/target/source/codegen_c_host.cc
##########
@@ -324,15 +348,19 @@ inline void CodeGenCHost::PrintTernaryCondExpr(const T* op, const char* compare,
 }
 
 runtime::Module BuildCHost(IRModule mod, Target target) {
+  bool is_aot_executor = (target->GetAttr<String>("executor").value_or("graph_runtime") == "aot");

Review comment:
       probably just "graph" instead of "graph_runtime"

##########
File path: src/relay/backend/aot_codegen.cc
##########
@@ -0,0 +1,675 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+/*!
+ * \file relay/backend/graph_codegen.cc
+ * \brief Graph runtime codegen
+ */
+
+#include <dmlc/any.h>
+#include <tvm/ir/module.h>
+#include <tvm/relay/expr_functor.h>
+#include <tvm/runtime/device_api.h>
+#include <tvm/tir/builtin.h>
+#include <tvm/tir/expr.h>
+#include <tvm/tir/stmt.h>
+
+#include <algorithm>
+#include <list>
+#include <string>
+#include <vector>
+
+#include "../../runtime/meta_data.h"
+#include "compile_engine.h"
+#include "utils.h"
+
+namespace tvm {
+namespace relay {
+namespace backend {
+
+using IntegerArray = Array<Integer>;
+using ShapeVector = std::vector<std::vector<int64_t>>;
+using GraphAttrs = std::unordered_map<std::string, dmlc::any>;
+using TargetsMap = std::unordered_map<int, Target>;
+
+/*! \brief Lowered outputs */
+struct AOTLoweredOutput {
+  std::string graph_tir;
+  Map<String, IRModule> lowered_funcs;
+  Array<tvm::runtime::Module> external_mods;
+  std::unordered_map<std::string, std::pair<int, const tvm::runtime::NDArray>> params;
+  runtime::AOTMetadata aot_metadata;
+};
+
+class AotReturnSidVisitor : public ExprVisitor {
+ public:
+  explicit AotReturnSidVisitor(Map<Expr, Array<IntegerArray>> storage_device_map)
+      : storage_device_map_{storage_device_map}, return_sid_{-1} {}
+
+  IntegerArray FindReturnSid(Function func) {
+    VisitExpr(func->body);
+    return return_sid_;
+  }
+
+ protected:
+  void AssignReturnSid(Expr e) {
+    auto iter = storage_device_map_.find(e);
+    if (iter != storage_device_map_.end()) {
+      return_sid_ = (*iter).second[0];
+    }
+  }
+
+  void VisitExpr_(const ConstantNode* cn) override {
+    ExprVisitor::VisitExpr_(cn);
+    AssignReturnSid(GetRef<Expr>(cn));
+  }
+
+  void VisitExpr_(const VarNode* vn) override {
+    ExprVisitor::VisitExpr_(vn);
+    AssignReturnSid(GetRef<Expr>(vn));
+  }
+
+  void VisitExpr_(const CallNode* cn) override {
+    ExprVisitor::VisitExpr_(cn);
+    AssignReturnSid(GetRef<Expr>(cn));
+  }
+
+  void VisitExpr_(const LetNode* op) override { VisitExpr(op->body); }
+
+  void VisitExpr_(const TupleNode* tn) override {
+    ExprVisitor::VisitExpr_(tn);
+    AssignReturnSid(GetRef<Expr>(tn));
+  }
+
+ private:
+  Map<Expr, Array<IntegerArray>> storage_device_map_;
+  IntegerArray return_sid_;
+};
+
+using TIRNetwork = tvm::Array<tir::Stmt>;
+
+/*! \brief Code generator for graph runtime */
+class AOTCodegen : public ExprVisitor {
+ protected:
+  /*!
+   * \brief Utility function to allocate a DLTensor or TVMValue
+   * \param  type the type of allocation
+   * \param num the number of variable to allocate on the stack
+   * \return PrimExpr representing the allocated object
+   */
+  PrimExpr StackAlloca(std::string type, size_t num) {
+    Array<PrimExpr> args = {tir::StringImm(type), ConstInt32(num)};
+    return tir::Call(DataType::Handle(), tir::builtin::tvm_stack_alloca(), args);
+  }
+
+  /*!
+   * \brief Utility function to allocate memory for storage identifiers
+   * \param  memory_size_byte size in bytes of the allocation
+   * \return PrimExpr representing the allocated memory
+   */
+  PrimExpr AllocateBackendMemory(int memory_size_byte) {
+    // TODO(giuseros): use tir::Allocate instead of TVMBackendAllocWorkspace
+    // to enable unified memory planning
+    static const Op& op = Op::Get("tir.TVMBackendAllocWorkspace");
+    return tvm::tir::Call(DataType::Handle(), op, {1, 0, memory_size_byte, 2, 8});
+  }
+
+  /*!
+   * \brief Utility function to convert a concrete integer to a PrimExpr.
+   * \param num the number to convert
+   * \return PrimExpr representing num
+   */
+  inline PrimExpr ConstInt32(size_t num) {
+    ICHECK_LE(num, std::numeric_limits<int>::max());
+    return tir::make_const(DataType::Int(32), static_cast<int>(num));
+  }
+
+  /*!
+   * \brief Return a vector of variables that represents the sids for the given Relay Expr
+   */
+  std::vector<tir::Var> pack_sid(Expr expr) {
+    Array<IntegerArray> sids = storage_device_map_[expr];
+    std::vector<tir::Var> sid_vars;
+
+    // Note that an expression can have multiple sids associated with it
+    // e.g., returning multiple values from a function
+    for (const auto& sid : sids[0]) {
+      // Determine if an sid is an output buffer
+      int sid_int = static_cast<int>((sid.as<IntImmNode>())->value);
+      auto output_iter = std::find(return_sid_.begin(), return_sid_.end(), sid_int);
+      if (output_iter != return_sid_.end()) {
+        int output_index = std::distance(return_sid_.begin(), output_iter);
+        sid_vars.push_back(main_signature_[input_vars_.size() + output_index]);
+        continue;
+      }
+      // Pack the sid inside the TVMValue
+      auto sid_array = te::Var(make_string("sid_", sid, "_value"), DataType::Handle());
+      auto sid_value = sids_table_[sid];
+      tvm::PrimExpr set_tensor =
+          tvm::tir::Call(DataType::Handle(), tvm::tir::builtin::tvm_struct_set(),
+                         {sid_array, 0, tir::builtin::kArrData, sid_value});
+      stmts_.push_back(tir::LetStmt(sid_array, StackAlloca("array", 1), tir::Evaluate(set_tensor)));
+      sid_vars.push_back(sid_array);
+    }
+    return sid_vars;
+  }
+
+  /*!
+   * \brief Utility function to return a parameter associated with an expression
+   * \param expr Relay Expression assicated with the parameter
+   * \return Variable that represents the DLTensor associated with the parameters
+   */
+  tir::Var pack_param(Expr expr) {
+    // TODO(giuseros): Using call_extern to call into lookup_linked_param. This is because the
+    // builtin::ret is not supported yet in the c target. Once return is supported we can use
+    // tvm_call_packed_lowered().
+    int param_sid = param_storage_ids_[reverse_params_lookup_[expr]];
+    auto lookup_linked_param_fn = tir::StringImm(::tvm::runtime::symbol::tvm_lookup_linked_param);
+    auto param_array = te::Var(make_string("param_", param_sid, "_array"), DataType::Handle());
+
+    // Compose the lookup_call using a local stack
+    Array<tir::Stmt> lookup_call;
+    auto param_var = te::Var(make_string("param_", param_sid, "_value"), DataType::Handle());
+    auto ret_var = te::Var("ret_value", DataType::Handle());
+    auto ret_code = te::Var("ret_value", DataType::Handle());
+
+    lookup_call.push_back(tir::Evaluate(
+        tvm::tir::Call(DataType::Handle(), tvm::tir::builtin::tvm_struct_set(),
+                       {param_var, 0, tir::builtin::kTVMValueContent, ConstInt32(param_sid)})));
+    lookup_call.push_back(tir::Evaluate(
+        tvm::tir::Call(DataType::Handle(), tir::builtin::call_extern(),
+                       {lookup_linked_param_fn, param_var, 0, 0, ret_var, ret_code, 0})));
+    auto ret_var_handle = tvm::tir::Call(DataType::Handle(), tvm::tir::builtin::tvm_struct_get(),
+                                         {ret_var, 0, tir::builtin::kTVMValueContent});
+
+    // Set the param to the value returned by lookup_call
+    tvm::PrimExpr set_param_array =
+        tvm::tir::Call(DataType::Handle(), tvm::tir::builtin::tvm_struct_set(),
+                       {param_array, 0, tir::builtin::kArrData, ret_var_handle});
+    lookup_call.push_back(tir::Evaluate(set_param_array));
+
+    tir::Stmt lookup_body = tir::SeqStmt(lookup_call);
+
+    // Allocate the DLTensors on the stack
+    lookup_body = tir::LetStmt(param_var, StackAlloca("arg_value", 1), lookup_body);
+    lookup_body = tir::LetStmt(ret_var, StackAlloca("arg_value", 1), lookup_body);
+    lookup_body = tir::LetStmt(ret_code, StackAlloca("arg_value", 1), lookup_body);
+    lookup_body = tir::LetStmt(param_array, StackAlloca("arg_value", 1), lookup_body);
+    stmts_.push_back(lookup_body);
+    return param_array;
+  }
+
+  /*!
+   * brief Given an expression return the variable(s) associated with that expression
+   */
+  std::vector<te::Var> find_expr(Expr arg) {
+    auto input_iter = std::find(input_vars_.begin(), input_vars_.end(), arg);
+    if (input_iter != input_vars_.end()) {
+      // Input variable
+      int main_index = std::distance(input_vars_.begin(), input_iter);
+      return {main_signature_[main_index]};
+    } else if (reverse_params_lookup_.find(arg) != reverse_params_lookup_.end()) {
+      // Parameter of the network
+      return {pack_param(arg)};
+    } else {
+      // Storage identifier (i.e., intermediate memory)
+      return pack_sid(arg);
+    }
+  }
+
+  /*!
+   * brief Call a function with a given name
+   */
+  void func_call(Call call, std::string func_name) {
+    tvm::Array<PrimExpr> args{tvm::tir::StringImm(func_name)};
+    std::vector<tir::Stmt> func_call_stmts;
+
+    // Pack the inputs
+    for (Expr arg : call->args) {
+      auto var_arg = find_expr(arg);
+      args.push_back(var_arg[0]);
+    }
+
+    auto ret_expr = Downcast<Expr>(call);
+
+    // Pack the return(s) value. A call node can produce multiple outputs
+    for (const auto& var : pack_sid(ret_expr)) {
+      args.push_back(var);
+    }
+
+    // Use tvm_call_packed to execute the function
+    func_call_stmts.push_back(tir::Evaluate(
+        tvm::tir::Call(DataType::Int(32), tvm::tir::builtin::tvm_call_packed(), args)));
+    tir::Stmt body = tir::SeqStmt(func_call_stmts);
+    stmts_.push_back(body);
+  }
+
+  /*!
+   * brief Copy a variable to the output. This function is mainly used in edge cases
+   * when we want to return an input or a parameter.
+   */
+  void copy_to_output(te::Var out, te::Var in, size_t size) {
+    auto retval_get = tvm::tir::Call(DataType::Handle(), tvm::tir::builtin::tvm_struct_get(),
+                                     {in, 0, tir::builtin::kArrData});
+
+    // Define intermediate DLTensor to load/store the data
+    auto tmp0 = te::Var("tmp0", DataType::Handle());
+    auto tmp1 = te::Var("tmp1", DataType::Handle());
+    te::Var loop_idx("i", DataType::Int(32));
+    auto retval_i = tir::Load(DataType::UInt(8), tmp0, loop_idx, tir::const_true());
+    auto tostore = tvm::tir::Call(DataType::Handle(), tvm::tir::builtin::tvm_struct_get(),
+                                  {out, 0, tir::builtin::kArrData});
+
+    // Copy the variable from the input to the output
+    tir::Stmt copy = tir::For(
+        loop_idx, 0, ConstInt32(size), tir::ForKind::kSerial,
+        tir::Store(tmp1, tir::Let(tmp0, retval_get, retval_i), loop_idx, tir::const_true()));
+    stmts_.push_back(tir::LetStmt(tmp1, tostore, copy));
+  }
+
+  /*!
+   * Utility function to string together different arguments
+   */
+  template <typename... Args>
+  std::string make_string(Args const&... args) {
+    std::ostringstream ss;
+    using List = int[];
+    (void)List{0, ((void)(ss << args), 0)...};
+
+    return ss.str();
+  }
+
+  void VisitExpr_(const CallNode* op) override {
+    // Descend the call tree
+    for (auto arg : op->args) {
+      VisitExpr(arg);
+    }
+
+    Expr expr = GetRef<Expr>(op);
+    Function func;
+    if (op->op.as<OpNode>()) {
+      LOG(FATAL) << "Operators should be transformed away; try applying"
+                 << "the fuse_ops transformation to the expression.";
+    } else if (op->op.as<GlobalVarNode>()) {
+      LOG(FATAL) << "Not implemented";
+    } else if (op->op.as<FunctionNode>()) {
+      func = GetRef<Function>(op->op.as<FunctionNode>());
+    } else {
+      LOG(FATAL) << "TVM runtime does not support calls to " << op->op->GetTypeKey();
+    }
+    if (!func->HasNonzeroAttr(attr::kPrimitive)) {
+      LOG(FATAL) << "TVM only support calls to primitive functions "
+                 << "(i.e functions composed of fusable operator invocations)";
+    }
+
+    auto pf0 = GetPackedFunc("relay.backend._make_CCacheKey");
+    auto pf1 = GetPackedFunc("relay.backend._CompileEngineLower");
+    Target target;
+    // Handle external function
+    if (func->GetAttr<String>(attr::kCompiler).defined()) {
+      target = Target("ext_dev");
+      CCacheKey key = (*pf0)(func, target);
+      CachedFunc ext_func = (*pf1)(compile_engine_, key);
+      ICHECK(ext_func.defined()) << "External function is not defined.";
+      UpdateConstants(func, &params_);
+
+      // Generate the TIR function call
+      func_call(GetRef<Call>(op), ext_func->func_name);
+    }
+
+    ICHECK_GE(storage_device_map_.count(expr), 0);
+    auto& device_type = storage_device_map_[expr][1];
+    auto call_dev_type = device_type[0]->value;
+    // Normal Relay Function
+    if (targets_.size() == 1) {
+      // homogeneous execution.
+      const auto& it = targets_.begin();
+      target = (*it).second;
+    } else {
+      // heterogeneous execution.
+      std::string call_dev_name;
+      if (call_dev_type == 0) {
+        call_dev_name = "llvm";
+      } else {
+        call_dev_name = runtime::DeviceName(call_dev_type);
+      }
+      if (targets_.count(call_dev_type) == 0) {
+        LOG(FATAL) << "No target is provided for device " << call_dev_name;
+      }
+      target = targets_[call_dev_type];
+    }
+    CCacheKey key = (*pf0)(func, target);
+    CachedFunc lowered_func = (*pf1)(compile_engine_, key);
+    if (!lowered_funcs_.count(target->str())) {
+      lowered_funcs_[target->str()] = IRModule(Map<GlobalVar, BaseFunc>({}));
+    }
+    lowered_funcs_[target->str()]->Update(lowered_func->funcs);
+
+    // Generate the TIR function call
+    func_call(GetRef<Call>(op), lowered_func->func_name);
+  }
+
+  void VisitExpr_(const VarNode* op) override {
+    Expr expr = GetRef<Expr>(op);
+
+    // If the Var node is an output node we need to copy the content of the variable to the output
+    // A Var node can only produce a single output
+    Array<IntegerArray> sids = storage_device_map_[expr];
+    auto output_iter = std::find(return_sid_.begin(), return_sid_.end(),
+                                 static_cast<int>((sids[0][0].as<IntImmNode>())->value));
+    if (output_iter != return_sid_.end()) {
+      int output_index = std::distance(return_sid_.begin(), output_iter);
+      auto var_expr = find_expr(expr);
+      copy_to_output(main_signature_[input_vars_.size() + output_index], var_expr[0], sids[2][0]);
+    }
+  }
+
+  void VisitExpr_(const ConstantNode* op) override {
+    Expr expr = GetRef<Expr>(op);
+    size_t index = params_.size();
+    std::string name = "p" + std::to_string(index);
+
+    param_storage_ids_[name] = storage_device_map_[expr][0][0]->value;
+    params_[name] = op->data;
+    reverse_params_lookup_.Set(expr, name);
+
+    // If the Constant node is an output node we need to copy the content of the parameter to the
+    // output A Var node can only produce a single output
+    Array<IntegerArray> sids = storage_device_map_[expr];
+    auto output_iter = std::find(return_sid_.begin(), return_sid_.end(),
+                                 static_cast<int>((sids[0][0].as<IntImmNode>())->value));
+    if (output_iter != return_sid_.end()) {
+      int output_index = std::distance(return_sid_.begin(), output_iter);
+      copy_to_output(main_signature_[input_vars_.size() + output_index], pack_param(expr),
+                     sids[2][0]);
+    }
+  }
+
+  void VisitExpr_(const TupleNode* op) override {
+    for (auto field : op->fields) {
+      VisitExpr(field);
+    }
+  }
+
+  void VisitExpr_(const LetNode* op) override {
+    // TODO(giuseros): support Let nodes in AOT
+    throw std::invalid_argument("Let not yet implemented in AOT");
+  }
+  void VisitExpr_(const TupleGetItemNode* op) override { VisitExpr(op->tuple); }
+  void VisitExpr_(const OpNode* op) override {
+    throw std::runtime_error("can not compile op in non-eta expanded form");
+  }
+  void VisitExpr_(const GlobalVarNode* op) override { throw std::runtime_error(""); }
+  void VisitExpr_(const IfNode* op) override { throw std::invalid_argument("if not supported"); }
+  void VisitExpr_(const FunctionNode* op) override {
+    ICHECK(op->GetAttr<String>(attr::kCompiler).defined())
+        << "Only functions supported by custom codegen";
+  }
+  void VisitExpr_(const RefCreateNode* op) override {
+    throw std::invalid_argument("reference not supported");
+  }
+  void VisitExpr_(const RefReadNode* op) override {
+    throw std::invalid_argument("reference not supported");
+  }
+  void VisitExpr_(const RefWriteNode* op) override {
+    throw std::invalid_argument("reference not supported");
+  }
+  void VisitExpr_(const ConstructorNode* op) override {
+    throw std::invalid_argument("ADT constructor case not yet implemented");
+  }
+  void VisitExpr_(const MatchNode* op) override {
+    throw std::invalid_argument("match case not yet implemented");
+  }
+
+  // Create the main PrimFunc to execute the graph
+  tir::PrimFunc CreateMainFunc(unsigned int relay_params) {
+    tir::Stmt body = tir::SeqStmt(stmts_);
+
+    // Allocate the sids
+    std::unordered_map<int, bool> allocated;
+
+    for (auto kv : storage_device_map_) {
+      // Only allocate sids that are needed
+      const bool is_input =
+          (std::find(input_vars_.begin(), input_vars_.end(), kv.first) != input_vars_.end());
+      const bool is_param = (reverse_params_lookup_.find(kv.first) != reverse_params_lookup_.end());
+      if (is_input || is_param) {
+        continue;
+      }
+
+      for (unsigned int i = 0; i < kv.second[0].size(); i++) {
+        int size = kv.second[2][i];
+        int sid = static_cast<int>((kv.second[0][i].as<IntImmNode>())->value);
+
+        if (std::find(return_sid_.begin(), return_sid_.end(), sid) != return_sid_.end()) {
+          continue;
+        }
+
+        if (!allocated[sid]) {
+          body = tir::LetStmt(sids_table_[sid], AllocateBackendMemory(size), body);
+        }
+        allocated[sid] = true;
+      }
+    }
+
+    // Define the attributes
+    body = tir::AttrStmt(PrimExpr(), tvm::tir::attr::device_type, 1, body);
+    body = tir::AttrStmt(PrimExpr(), tvm::tir::attr::device_id, 0, body);
+
+    // Make the PrimFunc
+    return tir::PrimFunc(main_signature_, body, VoidType(), Map<tir::Var, tir::Buffer>(),
+                         DictAttrs(dict_attrs_));
+  }
+
+ protected:
+  /*! \brief nodes */
+  /*! \brief mod */
+  runtime::Module* mod_;
+  std::vector<Expr> input_vars_;
+  Array<tir::Var> main_signature_;
+  /*! \brief target device */
+  TargetsMap targets_;
+  Target target_host_;
+  Map<String, ObjectRef> dict_attrs_;
+
+  /*!
+   * \brief parameters (i.e. ConstantNodes found in the graph).
+   * These are take as inputs to the GraphRuntime.
+   * Maps param name to a pair of storage_id and NDArray. At runtime, the storage_id can be
+   * used to lookup the parameter.
+   */
+  Map<Expr, String> reverse_params_lookup_;
+  std::unordered_map<std::string, runtime::NDArray> params_;
+  std::unordered_map<std::string, int64_t> param_storage_ids_;
+
+  /*! \brief plan memory of device result */
+  Map<Expr, Array<IntegerArray>> storage_device_map_;
+  std::unordered_map<int, te::Var> sids_table_;
+  /*! \brief lowered funcs */
+  std::unordered_map<std::string, IRModule> lowered_funcs_;
+  /*! \brief name map */
+  std::unordered_map<std::string, size_t> name_map_;
+  /*! \brief compile engine */
+  CompileEngine compile_engine_;
+  /*! \brief GraphPlanMemory module */
+  runtime::Module graph_plan_memory_module_;
+  /*! \brief the IR module stored which represents the executor program */
+  Map<String, IRModule> tir_module_;
+  /*! \brief the set of statements that make the program */
+  std::vector<tir::Stmt> stmts_;
+  /*! \brief the list of return sids (note that the function might return more then one output */
+  IntegerArray return_sid_;
+
+ public:
+  AOTCodegen(runtime::Module* mod, const TargetsMap& targets, Target target_host)
+      : mod_(mod), return_sid_() {
+    compile_engine_ = CompileEngine::Global();
+    targets_ = targets;
+    target_host_ = target_host;
+    dict_attrs_.Set("global_symbol", runtime::String("tvm__run_func"));
+  }
+
+  AOTLoweredOutput Codegen(relay::Function func) {
+    // Get the module, storage map and token sizes
+    auto pf = GetPackedFunc("relay.backend.GraphPlanMemory");
+    storage_device_map_ = (*pf)(func);
+
+    int input_index = 0;
+    for (auto input : func->params) {
+      input_vars_.push_back(input);
+      main_signature_.push_back(tir::Var(make_string("input_", input_index), DataType::Handle()));
+    }
+
+    // Define the storage allocator ids
+    for (auto kv : storage_device_map_) {
+      for (const auto& sid : kv.second[0]) {
+        te::Var sid_var(make_string("sid_", sid), DataType::Handle());
+        sids_table_[sid] = sid_var;
+      }
+    }
+
+    // Find the return sid
+    return_sid_ = AotReturnSidVisitor(storage_device_map_).FindReturnSid(func);
+    for (unsigned int output_index = 0; output_index < return_sid_.size(); output_index++) {
+      main_signature_.push_back(tir::Var(make_string("output_", output_index), DataType::Handle()));
+    }
+
+    VisitExpr(func->body);
+
+    auto prim_func = CreateMainFunc(func->params.size());
+    AOTLoweredOutput ret;
+
+    ret.params = std::unordered_map<std::string, std::pair<int, const tvm::runtime::NDArray>>();
+    for (auto param : params_) {
+      ret.params.emplace(std::make_pair(
+          param.first,
+          std::make_pair(static_cast<int>(param_storage_ids_[param.first]), param.second)));
+    }
+
+    for (auto& kv : lowered_funcs_) {
+      if (ret.lowered_funcs.count(kv.first) == 0) {
+        ret.lowered_funcs.Set(kv.first, IRModule(Map<GlobalVar, BaseFunc>({})));
+      }
+      auto& mod = ret.lowered_funcs[kv.first];
+      mod->Update(kv.second);
+      ret.lowered_funcs.Set(kv.first, mod);
+    }
+    ret.external_mods = compile_engine_->LowerExternalFunctions();
+
+    auto target_host_str = target_host_->str();
+    if (ret.lowered_funcs.find(target_host_str) != ret.lowered_funcs.end()) {
+      ret.lowered_funcs[target_host_str]->Add(
+          GlobalVar(::tvm::runtime::symbol::tvm_run_func_prefix), prim_func);
+    } else {
+      Map<GlobalVar, BaseFunc> symbol_map;
+      symbol_map.Set(GlobalVar(::tvm::runtime::symbol::tvm_run_func_prefix), prim_func);
+      ret.lowered_funcs.Set(target_host_str, IRModule(symbol_map));
+    }
+
+    ret.graph_tir = PrettyPrint(prim_func);
+    ret.aot_metadata = runtime::AOTMetadata(input_vars_.size(), return_sid_.size());
+    return ret;
+  }
+};
+
+class AOTCodegenModule : public runtime::ModuleNode {
+ public:
+  AOTCodegenModule() {}
+  virtual PackedFunc GetFunction(const std::string& name, const ObjectPtr<Object>& sptr_to_self) {
+    if (name == "init") {
+      return PackedFunc([sptr_to_self, this](TVMArgs args, TVMRetValue* rv) {

Review comment:
       minor style thing--can you put each PackedFunc body into a C++ member function?

##########
File path: src/target/source/codegen_c_host.cc
##########
@@ -355,6 +394,12 @@ runtime::Module BuildCHost(IRModule mod, Target target) {
     cg.LinkParameters(linked_params);
   }
 
+  if (is_aot_executor) {
+    ICHECK(aot_executor_fn.defined())
+        << "When using aot executor the executor function should be defined";

Review comment:
       can you just state the criteria for this function in the error? what prefix are you looking for?

##########
File path: src/tir/op/builtin.cc
##########
@@ -174,6 +174,9 @@ TIR_DEFINE_BUILTIN_FUNC(tvm_stack_make_array)
 TIR_DEFINE_BUILTIN_FUNC(tvm_call_packed)
     .set_attr<TCallEffectKind>("TCallEffectKind", Integer(CallEffectKind::kOpaque));
 
+TIR_DEFINE_BUILTIN_FUNC(tvm_call_unpacked)

Review comment:
       do you need this?

##########
File path: src/target/source/source_module.cc
##########
@@ -191,17 +192,35 @@ class CSourceCrtMetadataModuleNode : public runtime::ModuleNode {
           << "}\n";
   }
 
+  void GenerateAOTDescriptor() {
+    code_ << "#include <tvm_executor.h>\n";
+    code_ << "#ifdef __cplusplus\n";
+    code_ << "extern \"C\"\n";
+    code_ << "#endif\n";
+    code_ << "TVM_DLL int32_t " << ::tvm::runtime::symbol::tvm_run_func_prefix;
+    code_ << "(void* args, void* type_code, int num_args, void* out_value, void* "
+             "out_type_code, void* resource_handle);\n";
+    code_ << "const tvm_model_t network = {\n"

Review comment:
       what if multiple models are linked into a single binary? adding @stoa as well to comment on the interface

##########
File path: src/tir/transforms/lower_tvm_builtin.cc
##########
@@ -247,7 +247,7 @@ class BuiltinLower : public StmtExprMutator {
     Array<PrimExpr> packed_args = {op->args[0], stack_value_, stack_tcode_,
                                    ConstInt32(arg_stack_begin),
                                    ConstInt32(arg_stack_begin + op->args.size() - 1)};
-    return Call(DataType::Int(32), builtin::tvm_call_packed_lowered(), packed_args);
+    return Call(op->dtype, builtin::tvm_call_packed_lowered(), packed_args);

Review comment:
       @tqchen I have no idea if 32 is needed here.

##########
File path: src/relay/backend/aot_codegen.cc
##########
@@ -0,0 +1,675 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+/*!
+ * \file relay/backend/graph_codegen.cc
+ * \brief Graph runtime codegen
+ */
+
+#include <dmlc/any.h>
+#include <tvm/ir/module.h>
+#include <tvm/relay/expr_functor.h>
+#include <tvm/runtime/device_api.h>
+#include <tvm/tir/builtin.h>
+#include <tvm/tir/expr.h>
+#include <tvm/tir/stmt.h>
+
+#include <algorithm>
+#include <list>
+#include <string>
+#include <vector>
+
+#include "../../runtime/meta_data.h"
+#include "compile_engine.h"
+#include "utils.h"
+
+namespace tvm {
+namespace relay {
+namespace backend {
+
+using IntegerArray = Array<Integer>;
+using ShapeVector = std::vector<std::vector<int64_t>>;
+using GraphAttrs = std::unordered_map<std::string, dmlc::any>;
+using TargetsMap = std::unordered_map<int, Target>;
+
+/*! \brief Lowered outputs */
+struct AOTLoweredOutput {
+  std::string graph_tir;
+  Map<String, IRModule> lowered_funcs;
+  Array<tvm::runtime::Module> external_mods;
+  std::unordered_map<std::string, std::pair<int, const tvm::runtime::NDArray>> params;
+  runtime::AOTMetadata aot_metadata;
+};
+
+class AotReturnSidVisitor : public ExprVisitor {
+ public:
+  explicit AotReturnSidVisitor(Map<Expr, Array<IntegerArray>> storage_device_map)
+      : storage_device_map_{storage_device_map}, return_sid_{-1} {}
+
+  IntegerArray FindReturnSid(Function func) {
+    VisitExpr(func->body);
+    return return_sid_;
+  }
+
+ protected:
+  void AssignReturnSid(Expr e) {
+    auto iter = storage_device_map_.find(e);
+    if (iter != storage_device_map_.end()) {
+      return_sid_ = (*iter).second[0];
+    }
+  }
+
+  void VisitExpr_(const ConstantNode* cn) override {
+    ExprVisitor::VisitExpr_(cn);
+    AssignReturnSid(GetRef<Expr>(cn));
+  }
+
+  void VisitExpr_(const VarNode* vn) override {
+    ExprVisitor::VisitExpr_(vn);
+    AssignReturnSid(GetRef<Expr>(vn));
+  }
+
+  void VisitExpr_(const CallNode* cn) override {
+    ExprVisitor::VisitExpr_(cn);
+    AssignReturnSid(GetRef<Expr>(cn));
+  }
+
+  void VisitExpr_(const LetNode* op) override { VisitExpr(op->body); }
+
+  void VisitExpr_(const TupleNode* tn) override {
+    ExprVisitor::VisitExpr_(tn);
+    AssignReturnSid(GetRef<Expr>(tn));
+  }
+
+ private:
+  Map<Expr, Array<IntegerArray>> storage_device_map_;
+  IntegerArray return_sid_;
+};
+
+using TIRNetwork = tvm::Array<tir::Stmt>;
+
+/*! \brief Code generator for graph runtime */
+class AOTCodegen : public ExprVisitor {
+ protected:
+  /*!
+   * \brief Utility function to allocate a DLTensor or TVMValue
+   * \param  type the type of allocation
+   * \param num the number of variable to allocate on the stack
+   * \return PrimExpr representing the allocated object
+   */
+  PrimExpr StackAlloca(std::string type, size_t num) {
+    Array<PrimExpr> args = {tir::StringImm(type), ConstInt32(num)};
+    return tir::Call(DataType::Handle(), tir::builtin::tvm_stack_alloca(), args);
+  }
+
+  /*!
+   * \brief Utility function to allocate memory for storage identifiers
+   * \param  memory_size_byte size in bytes of the allocation
+   * \return PrimExpr representing the allocated memory
+   */
+  PrimExpr AllocateBackendMemory(int memory_size_byte) {
+    // TODO(giuseros): use tir::Allocate instead of TVMBackendAllocWorkspace
+    // to enable unified memory planning
+    static const Op& op = Op::Get("tir.TVMBackendAllocWorkspace");
+    return tvm::tir::Call(DataType::Handle(), op, {1, 0, memory_size_byte, 2, 8});
+  }
+
+  /*!
+   * \brief Utility function to convert a concrete integer to a PrimExpr.
+   * \param num the number to convert
+   * \return PrimExpr representing num
+   */
+  inline PrimExpr ConstInt32(size_t num) {
+    ICHECK_LE(num, std::numeric_limits<int>::max());
+    return tir::make_const(DataType::Int(32), static_cast<int>(num));
+  }
+
+  /*!
+   * \brief Return a vector of variables that represents the sids for the given Relay Expr
+   */
+  std::vector<tir::Var> pack_sid(Expr expr) {
+    Array<IntegerArray> sids = storage_device_map_[expr];
+    std::vector<tir::Var> sid_vars;
+
+    // Note that an expression can have multiple sids associated with it
+    // e.g., returning multiple values from a function
+    for (const auto& sid : sids[0]) {
+      // Determine if an sid is an output buffer
+      int sid_int = static_cast<int>((sid.as<IntImmNode>())->value);
+      auto output_iter = std::find(return_sid_.begin(), return_sid_.end(), sid_int);
+      if (output_iter != return_sid_.end()) {
+        int output_index = std::distance(return_sid_.begin(), output_iter);
+        sid_vars.push_back(main_signature_[input_vars_.size() + output_index]);
+        continue;
+      }
+      // Pack the sid inside the TVMValue
+      auto sid_array = te::Var(make_string("sid_", sid, "_value"), DataType::Handle());
+      auto sid_value = sids_table_[sid];
+      tvm::PrimExpr set_tensor =
+          tvm::tir::Call(DataType::Handle(), tvm::tir::builtin::tvm_struct_set(),
+                         {sid_array, 0, tir::builtin::kArrData, sid_value});
+      stmts_.push_back(tir::LetStmt(sid_array, StackAlloca("array", 1), tir::Evaluate(set_tensor)));
+      sid_vars.push_back(sid_array);
+    }
+    return sid_vars;
+  }
+
+  /*!
+   * \brief Utility function to return a parameter associated with an expression
+   * \param expr Relay Expression assicated with the parameter
+   * \return Variable that represents the DLTensor associated with the parameters
+   */
+  tir::Var pack_param(Expr expr) {
+    // TODO(giuseros): Using call_extern to call into lookup_linked_param. This is because the
+    // builtin::ret is not supported yet in the c target. Once return is supported we can use
+    // tvm_call_packed_lowered().
+    int param_sid = param_storage_ids_[reverse_params_lookup_[expr]];
+    auto lookup_linked_param_fn = tir::StringImm(::tvm::runtime::symbol::tvm_lookup_linked_param);
+    auto param_array = te::Var(make_string("param_", param_sid, "_array"), DataType::Handle());
+
+    // Compose the lookup_call using a local stack
+    Array<tir::Stmt> lookup_call;
+    auto param_var = te::Var(make_string("param_", param_sid, "_value"), DataType::Handle());
+    auto ret_var = te::Var("ret_value", DataType::Handle());
+    auto ret_code = te::Var("ret_value", DataType::Handle());
+
+    lookup_call.push_back(tir::Evaluate(
+        tvm::tir::Call(DataType::Handle(), tvm::tir::builtin::tvm_struct_set(),
+                       {param_var, 0, tir::builtin::kTVMValueContent, ConstInt32(param_sid)})));
+    lookup_call.push_back(tir::Evaluate(
+        tvm::tir::Call(DataType::Handle(), tir::builtin::call_extern(),
+                       {lookup_linked_param_fn, param_var, 0, 0, ret_var, ret_code, 0})));
+    auto ret_var_handle = tvm::tir::Call(DataType::Handle(), tvm::tir::builtin::tvm_struct_get(),
+                                         {ret_var, 0, tir::builtin::kTVMValueContent});
+
+    // Set the param to the value returned by lookup_call
+    tvm::PrimExpr set_param_array =
+        tvm::tir::Call(DataType::Handle(), tvm::tir::builtin::tvm_struct_set(),
+                       {param_array, 0, tir::builtin::kArrData, ret_var_handle});
+    lookup_call.push_back(tir::Evaluate(set_param_array));
+
+    tir::Stmt lookup_body = tir::SeqStmt(lookup_call);
+
+    // Allocate the DLTensors on the stack
+    lookup_body = tir::LetStmt(param_var, StackAlloca("arg_value", 1), lookup_body);
+    lookup_body = tir::LetStmt(ret_var, StackAlloca("arg_value", 1), lookup_body);
+    lookup_body = tir::LetStmt(ret_code, StackAlloca("arg_value", 1), lookup_body);
+    lookup_body = tir::LetStmt(param_array, StackAlloca("arg_value", 1), lookup_body);
+    stmts_.push_back(lookup_body);
+    return param_array;
+  }
+
+  /*!
+   * brief Given an expression return the variable(s) associated with that expression
+   */
+  std::vector<te::Var> find_expr(Expr arg) {
+    auto input_iter = std::find(input_vars_.begin(), input_vars_.end(), arg);
+    if (input_iter != input_vars_.end()) {
+      // Input variable
+      int main_index = std::distance(input_vars_.begin(), input_iter);
+      return {main_signature_[main_index]};
+    } else if (reverse_params_lookup_.find(arg) != reverse_params_lookup_.end()) {
+      // Parameter of the network
+      return {pack_param(arg)};
+    } else {
+      // Storage identifier (i.e., intermediate memory)
+      return pack_sid(arg);
+    }
+  }
+
+  /*!
+   * brief Call a function with a given name
+   */
+  void func_call(Call call, std::string func_name) {
+    tvm::Array<PrimExpr> args{tvm::tir::StringImm(func_name)};
+    std::vector<tir::Stmt> func_call_stmts;
+
+    // Pack the inputs
+    for (Expr arg : call->args) {
+      auto var_arg = find_expr(arg);
+      args.push_back(var_arg[0]);
+    }
+
+    auto ret_expr = Downcast<Expr>(call);
+
+    // Pack the return(s) value. A call node can produce multiple outputs
+    for (const auto& var : pack_sid(ret_expr)) {
+      args.push_back(var);
+    }
+
+    // Use tvm_call_packed to execute the function
+    func_call_stmts.push_back(tir::Evaluate(
+        tvm::tir::Call(DataType::Int(32), tvm::tir::builtin::tvm_call_packed(), args)));
+    tir::Stmt body = tir::SeqStmt(func_call_stmts);
+    stmts_.push_back(body);
+  }
+
+  /*!
+   * brief Copy a variable to the output. This function is mainly used in edge cases
+   * when we want to return an input or a parameter.
+   */
+  void copy_to_output(te::Var out, te::Var in, size_t size) {
+    auto retval_get = tvm::tir::Call(DataType::Handle(), tvm::tir::builtin::tvm_struct_get(),
+                                     {in, 0, tir::builtin::kArrData});
+
+    // Define intermediate DLTensor to load/store the data
+    auto tmp0 = te::Var("tmp0", DataType::Handle());
+    auto tmp1 = te::Var("tmp1", DataType::Handle());
+    te::Var loop_idx("i", DataType::Int(32));
+    auto retval_i = tir::Load(DataType::UInt(8), tmp0, loop_idx, tir::const_true());
+    auto tostore = tvm::tir::Call(DataType::Handle(), tvm::tir::builtin::tvm_struct_get(),
+                                  {out, 0, tir::builtin::kArrData});
+
+    // Copy the variable from the input to the output
+    tir::Stmt copy = tir::For(
+        loop_idx, 0, ConstInt32(size), tir::ForKind::kSerial,
+        tir::Store(tmp1, tir::Let(tmp0, retval_get, retval_i), loop_idx, tir::const_true()));
+    stmts_.push_back(tir::LetStmt(tmp1, tostore, copy));
+  }
+
+  /*!
+   * Utility function to string together different arguments
+   */
+  template <typename... Args>
+  std::string make_string(Args const&... args) {
+    std::ostringstream ss;
+    using List = int[];
+    (void)List{0, ((void)(ss << args), 0)...};
+
+    return ss.str();
+  }
+
+  void VisitExpr_(const CallNode* op) override {
+    // Descend the call tree
+    for (auto arg : op->args) {
+      VisitExpr(arg);
+    }
+
+    Expr expr = GetRef<Expr>(op);
+    Function func;
+    if (op->op.as<OpNode>()) {
+      LOG(FATAL) << "Operators should be transformed away; try applying"
+                 << "the fuse_ops transformation to the expression.";
+    } else if (op->op.as<GlobalVarNode>()) {
+      LOG(FATAL) << "Not implemented";
+    } else if (op->op.as<FunctionNode>()) {
+      func = GetRef<Function>(op->op.as<FunctionNode>());
+    } else {
+      LOG(FATAL) << "TVM runtime does not support calls to " << op->op->GetTypeKey();
+    }
+    if (!func->HasNonzeroAttr(attr::kPrimitive)) {
+      LOG(FATAL) << "TVM only support calls to primitive functions "
+                 << "(i.e functions composed of fusable operator invocations)";
+    }
+
+    auto pf0 = GetPackedFunc("relay.backend._make_CCacheKey");
+    auto pf1 = GetPackedFunc("relay.backend._CompileEngineLower");
+    Target target;
+    // Handle external function
+    if (func->GetAttr<String>(attr::kCompiler).defined()) {
+      target = Target("ext_dev");
+      CCacheKey key = (*pf0)(func, target);
+      CachedFunc ext_func = (*pf1)(compile_engine_, key);
+      ICHECK(ext_func.defined()) << "External function is not defined.";
+      UpdateConstants(func, &params_);
+
+      // Generate the TIR function call
+      func_call(GetRef<Call>(op), ext_func->func_name);
+    }
+
+    ICHECK_GE(storage_device_map_.count(expr), 0);
+    auto& device_type = storage_device_map_[expr][1];
+    auto call_dev_type = device_type[0]->value;
+    // Normal Relay Function
+    if (targets_.size() == 1) {
+      // homogeneous execution.
+      const auto& it = targets_.begin();
+      target = (*it).second;
+    } else {
+      // heterogeneous execution.
+      std::string call_dev_name;
+      if (call_dev_type == 0) {
+        call_dev_name = "llvm";
+      } else {
+        call_dev_name = runtime::DeviceName(call_dev_type);
+      }
+      if (targets_.count(call_dev_type) == 0) {
+        LOG(FATAL) << "No target is provided for device " << call_dev_name;
+      }
+      target = targets_[call_dev_type];
+    }
+    CCacheKey key = (*pf0)(func, target);
+    CachedFunc lowered_func = (*pf1)(compile_engine_, key);
+    if (!lowered_funcs_.count(target->str())) {
+      lowered_funcs_[target->str()] = IRModule(Map<GlobalVar, BaseFunc>({}));
+    }
+    lowered_funcs_[target->str()]->Update(lowered_func->funcs);
+
+    // Generate the TIR function call
+    func_call(GetRef<Call>(op), lowered_func->func_name);
+  }
+
+  void VisitExpr_(const VarNode* op) override {
+    Expr expr = GetRef<Expr>(op);
+
+    // If the Var node is an output node we need to copy the content of the variable to the output
+    // A Var node can only produce a single output
+    Array<IntegerArray> sids = storage_device_map_[expr];
+    auto output_iter = std::find(return_sid_.begin(), return_sid_.end(),
+                                 static_cast<int>((sids[0][0].as<IntImmNode>())->value));
+    if (output_iter != return_sid_.end()) {
+      int output_index = std::distance(return_sid_.begin(), output_iter);
+      auto var_expr = find_expr(expr);
+      copy_to_output(main_signature_[input_vars_.size() + output_index], var_expr[0], sids[2][0]);
+    }
+  }
+
+  void VisitExpr_(const ConstantNode* op) override {
+    Expr expr = GetRef<Expr>(op);
+    size_t index = params_.size();
+    std::string name = "p" + std::to_string(index);
+
+    param_storage_ids_[name] = storage_device_map_[expr][0][0]->value;
+    params_[name] = op->data;
+    reverse_params_lookup_.Set(expr, name);
+
+    // If the Constant node is an output node we need to copy the content of the parameter to the
+    // output A Var node can only produce a single output
+    Array<IntegerArray> sids = storage_device_map_[expr];
+    auto output_iter = std::find(return_sid_.begin(), return_sid_.end(),
+                                 static_cast<int>((sids[0][0].as<IntImmNode>())->value));
+    if (output_iter != return_sid_.end()) {
+      int output_index = std::distance(return_sid_.begin(), output_iter);
+      copy_to_output(main_signature_[input_vars_.size() + output_index], pack_param(expr),
+                     sids[2][0]);
+    }
+  }
+
+  void VisitExpr_(const TupleNode* op) override {
+    for (auto field : op->fields) {
+      VisitExpr(field);
+    }
+  }
+
+  void VisitExpr_(const LetNode* op) override {
+    // TODO(giuseros): support Let nodes in AOT
+    throw std::invalid_argument("Let not yet implemented in AOT");
+  }
+  void VisitExpr_(const TupleGetItemNode* op) override { VisitExpr(op->tuple); }
+  void VisitExpr_(const OpNode* op) override {
+    throw std::runtime_error("can not compile op in non-eta expanded form");
+  }
+  void VisitExpr_(const GlobalVarNode* op) override { throw std::runtime_error(""); }
+  void VisitExpr_(const IfNode* op) override { throw std::invalid_argument("if not supported"); }
+  void VisitExpr_(const FunctionNode* op) override {
+    ICHECK(op->GetAttr<String>(attr::kCompiler).defined())
+        << "Only functions supported by custom codegen";
+  }
+  void VisitExpr_(const RefCreateNode* op) override {
+    throw std::invalid_argument("reference not supported");
+  }
+  void VisitExpr_(const RefReadNode* op) override {
+    throw std::invalid_argument("reference not supported");
+  }
+  void VisitExpr_(const RefWriteNode* op) override {
+    throw std::invalid_argument("reference not supported");
+  }
+  void VisitExpr_(const ConstructorNode* op) override {
+    throw std::invalid_argument("ADT constructor case not yet implemented");
+  }
+  void VisitExpr_(const MatchNode* op) override {
+    throw std::invalid_argument("match case not yet implemented");
+  }
+
+  // Create the main PrimFunc to execute the graph
+  tir::PrimFunc CreateMainFunc(unsigned int relay_params) {
+    tir::Stmt body = tir::SeqStmt(stmts_);
+
+    // Allocate the sids
+    std::unordered_map<int, bool> allocated;
+
+    for (auto kv : storage_device_map_) {
+      // Only allocate sids that are needed
+      const bool is_input =
+          (std::find(input_vars_.begin(), input_vars_.end(), kv.first) != input_vars_.end());
+      const bool is_param = (reverse_params_lookup_.find(kv.first) != reverse_params_lookup_.end());
+      if (is_input || is_param) {
+        continue;
+      }
+
+      for (unsigned int i = 0; i < kv.second[0].size(); i++) {
+        int size = kv.second[2][i];
+        int sid = static_cast<int>((kv.second[0][i].as<IntImmNode>())->value);
+
+        if (std::find(return_sid_.begin(), return_sid_.end(), sid) != return_sid_.end()) {
+          continue;
+        }
+
+        if (!allocated[sid]) {
+          body = tir::LetStmt(sids_table_[sid], AllocateBackendMemory(size), body);
+        }
+        allocated[sid] = true;
+      }
+    }
+
+    // Define the attributes
+    body = tir::AttrStmt(PrimExpr(), tvm::tir::attr::device_type, 1, body);
+    body = tir::AttrStmt(PrimExpr(), tvm::tir::attr::device_id, 0, body);
+
+    // Make the PrimFunc
+    return tir::PrimFunc(main_signature_, body, VoidType(), Map<tir::Var, tir::Buffer>(),
+                         DictAttrs(dict_attrs_));
+  }
+
+ protected:
+  /*! \brief nodes */
+  /*! \brief mod */
+  runtime::Module* mod_;
+  std::vector<Expr> input_vars_;
+  Array<tir::Var> main_signature_;
+  /*! \brief target device */
+  TargetsMap targets_;
+  Target target_host_;
+  Map<String, ObjectRef> dict_attrs_;
+
+  /*!
+   * \brief parameters (i.e. ConstantNodes found in the graph).
+   * These are take as inputs to the GraphRuntime.
+   * Maps param name to a pair of storage_id and NDArray. At runtime, the storage_id can be
+   * used to lookup the parameter.
+   */
+  Map<Expr, String> reverse_params_lookup_;
+  std::unordered_map<std::string, runtime::NDArray> params_;

Review comment:
       add docs for these (key and value)

##########
File path: src/relay/backend/graph_plan_memory.cc
##########
@@ -209,15 +209,17 @@ class StorageAllocator : public StorageAllocaBaseVisitor {
     for (const auto& kv : token_map_) {
       std::vector<Integer> storage_ids;
       std::vector<Integer> device_types;
+      std::vector<Integer> sid_sizes;

Review comment:
       add units wherever you refer to this

##########
File path: src/relay/backend/aot_codegen.cc
##########
@@ -0,0 +1,675 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+/*!
+ * \file relay/backend/graph_codegen.cc
+ * \brief Graph runtime codegen
+ */
+
+#include <dmlc/any.h>
+#include <tvm/ir/module.h>
+#include <tvm/relay/expr_functor.h>
+#include <tvm/runtime/device_api.h>
+#include <tvm/tir/builtin.h>
+#include <tvm/tir/expr.h>
+#include <tvm/tir/stmt.h>
+
+#include <algorithm>
+#include <list>
+#include <string>
+#include <vector>
+
+#include "../../runtime/meta_data.h"
+#include "compile_engine.h"
+#include "utils.h"
+
+namespace tvm {
+namespace relay {
+namespace backend {
+
+using IntegerArray = Array<Integer>;
+using ShapeVector = std::vector<std::vector<int64_t>>;
+using GraphAttrs = std::unordered_map<std::string, dmlc::any>;
+using TargetsMap = std::unordered_map<int, Target>;
+
+/*! \brief Lowered outputs */
+struct AOTLoweredOutput {
+  std::string graph_tir;
+  Map<String, IRModule> lowered_funcs;
+  Array<tvm::runtime::Module> external_mods;
+  std::unordered_map<std::string, std::pair<int, const tvm::runtime::NDArray>> params;
+  runtime::AOTMetadata aot_metadata;
+};
+
+class AotReturnSidVisitor : public ExprVisitor {
+ public:
+  explicit AotReturnSidVisitor(Map<Expr, Array<IntegerArray>> storage_device_map)
+      : storage_device_map_{storage_device_map}, return_sid_{-1} {}
+
+  IntegerArray FindReturnSid(Function func) {
+    VisitExpr(func->body);
+    return return_sid_;
+  }
+
+ protected:
+  void AssignReturnSid(Expr e) {
+    auto iter = storage_device_map_.find(e);
+    if (iter != storage_device_map_.end()) {
+      return_sid_ = (*iter).second[0];
+    }
+  }
+
+  void VisitExpr_(const ConstantNode* cn) override {
+    ExprVisitor::VisitExpr_(cn);
+    AssignReturnSid(GetRef<Expr>(cn));
+  }
+
+  void VisitExpr_(const VarNode* vn) override {
+    ExprVisitor::VisitExpr_(vn);
+    AssignReturnSid(GetRef<Expr>(vn));
+  }
+
+  void VisitExpr_(const CallNode* cn) override {
+    ExprVisitor::VisitExpr_(cn);
+    AssignReturnSid(GetRef<Expr>(cn));
+  }
+
+  void VisitExpr_(const LetNode* op) override { VisitExpr(op->body); }
+
+  void VisitExpr_(const TupleNode* tn) override {
+    ExprVisitor::VisitExpr_(tn);
+    AssignReturnSid(GetRef<Expr>(tn));
+  }
+
+ private:
+  Map<Expr, Array<IntegerArray>> storage_device_map_;
+  IntegerArray return_sid_;
+};
+
+using TIRNetwork = tvm::Array<tir::Stmt>;
+
+/*! \brief Code generator for graph runtime */
+class AOTCodegen : public ExprVisitor {
+ protected:
+  /*!
+   * \brief Utility function to allocate a DLTensor or TVMValue
+   * \param  type the type of allocation
+   * \param num the number of variable to allocate on the stack
+   * \return PrimExpr representing the allocated object
+   */
+  PrimExpr StackAlloca(std::string type, size_t num) {
+    Array<PrimExpr> args = {tir::StringImm(type), ConstInt32(num)};
+    return tir::Call(DataType::Handle(), tir::builtin::tvm_stack_alloca(), args);
+  }
+
+  /*!
+   * \brief Utility function to allocate memory for storage identifiers
+   * \param  memory_size_byte size in bytes of the allocation
+   * \return PrimExpr representing the allocated memory
+   */
+  PrimExpr AllocateBackendMemory(int memory_size_byte) {
+    // TODO(giuseros): use tir::Allocate instead of TVMBackendAllocWorkspace
+    // to enable unified memory planning
+    static const Op& op = Op::Get("tir.TVMBackendAllocWorkspace");
+    return tvm::tir::Call(DataType::Handle(), op, {1, 0, memory_size_byte, 2, 8});
+  }
+
+  /*!
+   * \brief Utility function to convert a concrete integer to a PrimExpr.
+   * \param num the number to convert
+   * \return PrimExpr representing num
+   */
+  inline PrimExpr ConstInt32(size_t num) {
+    ICHECK_LE(num, std::numeric_limits<int>::max());
+    return tir::make_const(DataType::Int(32), static_cast<int>(num));
+  }
+
+  /*!
+   * \brief Return a vector of variables that represents the sids for the given Relay Expr
+   */
+  std::vector<tir::Var> pack_sid(Expr expr) {
+    Array<IntegerArray> sids = storage_device_map_[expr];
+    std::vector<tir::Var> sid_vars;
+
+    // Note that an expression can have multiple sids associated with it
+    // e.g., returning multiple values from a function
+    for (const auto& sid : sids[0]) {
+      // Determine if an sid is an output buffer
+      int sid_int = static_cast<int>((sid.as<IntImmNode>())->value);
+      auto output_iter = std::find(return_sid_.begin(), return_sid_.end(), sid_int);
+      if (output_iter != return_sid_.end()) {
+        int output_index = std::distance(return_sid_.begin(), output_iter);
+        sid_vars.push_back(main_signature_[input_vars_.size() + output_index]);
+        continue;
+      }
+      // Pack the sid inside the TVMValue
+      auto sid_array = te::Var(make_string("sid_", sid, "_value"), DataType::Handle());
+      auto sid_value = sids_table_[sid];
+      tvm::PrimExpr set_tensor =
+          tvm::tir::Call(DataType::Handle(), tvm::tir::builtin::tvm_struct_set(),
+                         {sid_array, 0, tir::builtin::kArrData, sid_value});
+      stmts_.push_back(tir::LetStmt(sid_array, StackAlloca("array", 1), tir::Evaluate(set_tensor)));
+      sid_vars.push_back(sid_array);
+    }
+    return sid_vars;
+  }
+
+  /*!
+   * \brief Utility function to return a parameter associated with an expression
+   * \param expr Relay Expression assicated with the parameter
+   * \return Variable that represents the DLTensor associated with the parameters
+   */
+  tir::Var pack_param(Expr expr) {
+    // TODO(giuseros): Using call_extern to call into lookup_linked_param. This is because the
+    // builtin::ret is not supported yet in the c target. Once return is supported we can use
+    // tvm_call_packed_lowered().
+    int param_sid = param_storage_ids_[reverse_params_lookup_[expr]];
+    auto lookup_linked_param_fn = tir::StringImm(::tvm::runtime::symbol::tvm_lookup_linked_param);
+    auto param_array = te::Var(make_string("param_", param_sid, "_array"), DataType::Handle());
+
+    // Compose the lookup_call using a local stack
+    Array<tir::Stmt> lookup_call;
+    auto param_var = te::Var(make_string("param_", param_sid, "_value"), DataType::Handle());
+    auto ret_var = te::Var("ret_value", DataType::Handle());
+    auto ret_code = te::Var("ret_value", DataType::Handle());
+
+    lookup_call.push_back(tir::Evaluate(
+        tvm::tir::Call(DataType::Handle(), tvm::tir::builtin::tvm_struct_set(),
+                       {param_var, 0, tir::builtin::kTVMValueContent, ConstInt32(param_sid)})));
+    lookup_call.push_back(tir::Evaluate(
+        tvm::tir::Call(DataType::Handle(), tir::builtin::call_extern(),
+                       {lookup_linked_param_fn, param_var, 0, 0, ret_var, ret_code, 0})));
+    auto ret_var_handle = tvm::tir::Call(DataType::Handle(), tvm::tir::builtin::tvm_struct_get(),
+                                         {ret_var, 0, tir::builtin::kTVMValueContent});
+
+    // Set the param to the value returned by lookup_call
+    tvm::PrimExpr set_param_array =
+        tvm::tir::Call(DataType::Handle(), tvm::tir::builtin::tvm_struct_set(),
+                       {param_array, 0, tir::builtin::kArrData, ret_var_handle});
+    lookup_call.push_back(tir::Evaluate(set_param_array));
+
+    tir::Stmt lookup_body = tir::SeqStmt(lookup_call);
+
+    // Allocate the DLTensors on the stack
+    lookup_body = tir::LetStmt(param_var, StackAlloca("arg_value", 1), lookup_body);
+    lookup_body = tir::LetStmt(ret_var, StackAlloca("arg_value", 1), lookup_body);
+    lookup_body = tir::LetStmt(ret_code, StackAlloca("arg_value", 1), lookup_body);
+    lookup_body = tir::LetStmt(param_array, StackAlloca("arg_value", 1), lookup_body);
+    stmts_.push_back(lookup_body);
+    return param_array;
+  }
+
+  /*!
+   * brief Given an expression return the variable(s) associated with that expression
+   */
+  std::vector<te::Var> find_expr(Expr arg) {
+    auto input_iter = std::find(input_vars_.begin(), input_vars_.end(), arg);
+    if (input_iter != input_vars_.end()) {
+      // Input variable
+      int main_index = std::distance(input_vars_.begin(), input_iter);
+      return {main_signature_[main_index]};
+    } else if (reverse_params_lookup_.find(arg) != reverse_params_lookup_.end()) {
+      // Parameter of the network
+      return {pack_param(arg)};
+    } else {
+      // Storage identifier (i.e., intermediate memory)
+      return pack_sid(arg);
+    }
+  }
+
+  /*!
+   * brief Call a function with a given name
+   */
+  void func_call(Call call, std::string func_name) {
+    tvm::Array<PrimExpr> args{tvm::tir::StringImm(func_name)};
+    std::vector<tir::Stmt> func_call_stmts;
+
+    // Pack the inputs
+    for (Expr arg : call->args) {
+      auto var_arg = find_expr(arg);
+      args.push_back(var_arg[0]);
+    }
+
+    auto ret_expr = Downcast<Expr>(call);
+
+    // Pack the return(s) value. A call node can produce multiple outputs
+    for (const auto& var : pack_sid(ret_expr)) {
+      args.push_back(var);
+    }
+
+    // Use tvm_call_packed to execute the function
+    func_call_stmts.push_back(tir::Evaluate(
+        tvm::tir::Call(DataType::Int(32), tvm::tir::builtin::tvm_call_packed(), args)));
+    tir::Stmt body = tir::SeqStmt(func_call_stmts);
+    stmts_.push_back(body);
+  }
+
+  /*!
+   * brief Copy a variable to the output. This function is mainly used in edge cases
+   * when we want to return an input or a parameter.
+   */
+  void copy_to_output(te::Var out, te::Var in, size_t size) {
+    auto retval_get = tvm::tir::Call(DataType::Handle(), tvm::tir::builtin::tvm_struct_get(),
+                                     {in, 0, tir::builtin::kArrData});
+
+    // Define intermediate DLTensor to load/store the data
+    auto tmp0 = te::Var("tmp0", DataType::Handle());
+    auto tmp1 = te::Var("tmp1", DataType::Handle());
+    te::Var loop_idx("i", DataType::Int(32));
+    auto retval_i = tir::Load(DataType::UInt(8), tmp0, loop_idx, tir::const_true());
+    auto tostore = tvm::tir::Call(DataType::Handle(), tvm::tir::builtin::tvm_struct_get(),
+                                  {out, 0, tir::builtin::kArrData});
+
+    // Copy the variable from the input to the output
+    tir::Stmt copy = tir::For(
+        loop_idx, 0, ConstInt32(size), tir::ForKind::kSerial,
+        tir::Store(tmp1, tir::Let(tmp0, retval_get, retval_i), loop_idx, tir::const_true()));
+    stmts_.push_back(tir::LetStmt(tmp1, tostore, copy));
+  }
+
+  /*!
+   * Utility function to string together different arguments
+   */
+  template <typename... Args>
+  std::string make_string(Args const&... args) {
+    std::ostringstream ss;
+    using List = int[];
+    (void)List{0, ((void)(ss << args), 0)...};
+
+    return ss.str();
+  }
+
+  void VisitExpr_(const CallNode* op) override {
+    // Descend the call tree
+    for (auto arg : op->args) {
+      VisitExpr(arg);
+    }
+
+    Expr expr = GetRef<Expr>(op);
+    Function func;
+    if (op->op.as<OpNode>()) {
+      LOG(FATAL) << "Operators should be transformed away; try applying"
+                 << "the fuse_ops transformation to the expression.";
+    } else if (op->op.as<GlobalVarNode>()) {
+      LOG(FATAL) << "Not implemented";
+    } else if (op->op.as<FunctionNode>()) {
+      func = GetRef<Function>(op->op.as<FunctionNode>());
+    } else {
+      LOG(FATAL) << "TVM runtime does not support calls to " << op->op->GetTypeKey();
+    }
+    if (!func->HasNonzeroAttr(attr::kPrimitive)) {
+      LOG(FATAL) << "TVM only support calls to primitive functions "
+                 << "(i.e functions composed of fusable operator invocations)";
+    }
+
+    auto pf0 = GetPackedFunc("relay.backend._make_CCacheKey");
+    auto pf1 = GetPackedFunc("relay.backend._CompileEngineLower");
+    Target target;
+    // Handle external function
+    if (func->GetAttr<String>(attr::kCompiler).defined()) {
+      target = Target("ext_dev");
+      CCacheKey key = (*pf0)(func, target);
+      CachedFunc ext_func = (*pf1)(compile_engine_, key);
+      ICHECK(ext_func.defined()) << "External function is not defined.";
+      UpdateConstants(func, &params_);
+
+      // Generate the TIR function call
+      func_call(GetRef<Call>(op), ext_func->func_name);
+    }
+
+    ICHECK_GE(storage_device_map_.count(expr), 0);
+    auto& device_type = storage_device_map_[expr][1];
+    auto call_dev_type = device_type[0]->value;
+    // Normal Relay Function
+    if (targets_.size() == 1) {
+      // homogeneous execution.
+      const auto& it = targets_.begin();
+      target = (*it).second;
+    } else {
+      // heterogeneous execution.
+      std::string call_dev_name;
+      if (call_dev_type == 0) {
+        call_dev_name = "llvm";
+      } else {
+        call_dev_name = runtime::DeviceName(call_dev_type);
+      }
+      if (targets_.count(call_dev_type) == 0) {
+        LOG(FATAL) << "No target is provided for device " << call_dev_name;
+      }
+      target = targets_[call_dev_type];
+    }
+    CCacheKey key = (*pf0)(func, target);
+    CachedFunc lowered_func = (*pf1)(compile_engine_, key);
+    if (!lowered_funcs_.count(target->str())) {
+      lowered_funcs_[target->str()] = IRModule(Map<GlobalVar, BaseFunc>({}));
+    }
+    lowered_funcs_[target->str()]->Update(lowered_func->funcs);
+
+    // Generate the TIR function call
+    func_call(GetRef<Call>(op), lowered_func->func_name);
+  }
+
+  void VisitExpr_(const VarNode* op) override {
+    Expr expr = GetRef<Expr>(op);
+
+    // If the Var node is an output node we need to copy the content of the variable to the output
+    // A Var node can only produce a single output
+    Array<IntegerArray> sids = storage_device_map_[expr];
+    auto output_iter = std::find(return_sid_.begin(), return_sid_.end(),
+                                 static_cast<int>((sids[0][0].as<IntImmNode>())->value));
+    if (output_iter != return_sid_.end()) {
+      int output_index = std::distance(return_sid_.begin(), output_iter);
+      auto var_expr = find_expr(expr);
+      copy_to_output(main_signature_[input_vars_.size() + output_index], var_expr[0], sids[2][0]);
+    }
+  }
+
+  void VisitExpr_(const ConstantNode* op) override {
+    Expr expr = GetRef<Expr>(op);
+    size_t index = params_.size();
+    std::string name = "p" + std::to_string(index);
+
+    param_storage_ids_[name] = storage_device_map_[expr][0][0]->value;
+    params_[name] = op->data;
+    reverse_params_lookup_.Set(expr, name);
+
+    // If the Constant node is an output node we need to copy the content of the parameter to the
+    // output A Var node can only produce a single output
+    Array<IntegerArray> sids = storage_device_map_[expr];
+    auto output_iter = std::find(return_sid_.begin(), return_sid_.end(),
+                                 static_cast<int>((sids[0][0].as<IntImmNode>())->value));
+    if (output_iter != return_sid_.end()) {
+      int output_index = std::distance(return_sid_.begin(), output_iter);
+      copy_to_output(main_signature_[input_vars_.size() + output_index], pack_param(expr),
+                     sids[2][0]);
+    }
+  }
+
+  void VisitExpr_(const TupleNode* op) override {
+    for (auto field : op->fields) {
+      VisitExpr(field);
+    }
+  }
+
+  void VisitExpr_(const LetNode* op) override {
+    // TODO(giuseros): support Let nodes in AOT
+    throw std::invalid_argument("Let not yet implemented in AOT");
+  }
+  void VisitExpr_(const TupleGetItemNode* op) override { VisitExpr(op->tuple); }
+  void VisitExpr_(const OpNode* op) override {
+    throw std::runtime_error("can not compile op in non-eta expanded form");
+  }
+  void VisitExpr_(const GlobalVarNode* op) override { throw std::runtime_error(""); }
+  void VisitExpr_(const IfNode* op) override { throw std::invalid_argument("if not supported"); }
+  void VisitExpr_(const FunctionNode* op) override {
+    ICHECK(op->GetAttr<String>(attr::kCompiler).defined())
+        << "Only functions supported by custom codegen";
+  }
+  void VisitExpr_(const RefCreateNode* op) override {
+    throw std::invalid_argument("reference not supported");
+  }
+  void VisitExpr_(const RefReadNode* op) override {
+    throw std::invalid_argument("reference not supported");
+  }
+  void VisitExpr_(const RefWriteNode* op) override {
+    throw std::invalid_argument("reference not supported");
+  }
+  void VisitExpr_(const ConstructorNode* op) override {
+    throw std::invalid_argument("ADT constructor case not yet implemented");
+  }
+  void VisitExpr_(const MatchNode* op) override {
+    throw std::invalid_argument("match case not yet implemented");
+  }
+
+  // Create the main PrimFunc to execute the graph
+  tir::PrimFunc CreateMainFunc(unsigned int relay_params) {
+    tir::Stmt body = tir::SeqStmt(stmts_);
+
+    // Allocate the sids
+    std::unordered_map<int, bool> allocated;
+
+    for (auto kv : storage_device_map_) {
+      // Only allocate sids that are needed
+      const bool is_input =
+          (std::find(input_vars_.begin(), input_vars_.end(), kv.first) != input_vars_.end());
+      const bool is_param = (reverse_params_lookup_.find(kv.first) != reverse_params_lookup_.end());
+      if (is_input || is_param) {
+        continue;
+      }
+
+      for (unsigned int i = 0; i < kv.second[0].size(); i++) {
+        int size = kv.second[2][i];
+        int sid = static_cast<int>((kv.second[0][i].as<IntImmNode>())->value);
+
+        if (std::find(return_sid_.begin(), return_sid_.end(), sid) != return_sid_.end()) {
+          continue;
+        }
+
+        if (!allocated[sid]) {
+          body = tir::LetStmt(sids_table_[sid], AllocateBackendMemory(size), body);
+        }
+        allocated[sid] = true;
+      }
+    }
+
+    // Define the attributes
+    body = tir::AttrStmt(PrimExpr(), tvm::tir::attr::device_type, 1, body);
+    body = tir::AttrStmt(PrimExpr(), tvm::tir::attr::device_id, 0, body);
+
+    // Make the PrimFunc
+    return tir::PrimFunc(main_signature_, body, VoidType(), Map<tir::Var, tir::Buffer>(),
+                         DictAttrs(dict_attrs_));
+  }
+
+ protected:
+  /*! \brief nodes */
+  /*! \brief mod */
+  runtime::Module* mod_;
+  std::vector<Expr> input_vars_;
+  Array<tir::Var> main_signature_;
+  /*! \brief target device */
+  TargetsMap targets_;
+  Target target_host_;
+  Map<String, ObjectRef> dict_attrs_;
+
+  /*!
+   * \brief parameters (i.e. ConstantNodes found in the graph).
+   * These are take as inputs to the GraphRuntime.
+   * Maps param name to a pair of storage_id and NDArray. At runtime, the storage_id can be
+   * used to lookup the parameter.
+   */
+  Map<Expr, String> reverse_params_lookup_;

Review comment:
       maybe params_by_expr_ or something. can you also add documentation explaining the NDArray (e.g. what's in it?)?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] giuseros commented on pull request #7785: [AOT] Introducing AOT in TVM

Posted by GitBox <gi...@apache.org>.

giuseros commented on pull request #7785:
URL: https://github.com/apache/tvm/pull/7785#issuecomment-814382489


   Hi @areusch  , just back from holidays. 
   
   First of all, thank you so much to be so thorough in the review. I appreciate this. 
   
   Instead of a big single reply, I will gradually reply to your comments 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] giuseros commented on a change in pull request #7785: [AOT] Introducing AOT in TVM

Posted by GitBox <gi...@apache.org>.

giuseros commented on a change in pull request #7785:
URL: https://github.com/apache/tvm/pull/7785#discussion_r608175049



##########
File path: python/tvm/micro/model_library_format.py
##########
@@ -156,10 +170,11 @@ def export_model_library_format(mod: graph_executor_factory.GraphExecutorFactory
     with open(tempdir.relpath("relay.txt"), "w") as f:
         f.write(str(mod.ir_mod))
 
-    graph_config_dir_path = tempdir.relpath(os.path.join("runtime-config", "graph"))
-    os.makedirs(graph_config_dir_path)
-    with open(os.path.join(graph_config_dir_path, "graph.json"), "w") as f:
-        f.write(mod.graph_json)
+    if not is_aot:

Review comment:
       Well the point is that I don't need any configuration with AOT. This is only saying "don't store the json if it is using the aot executor"




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] giuseros commented on a change in pull request #7785: [AOT] Introducing AOT in TVM

Posted by GitBox <gi...@apache.org>.

giuseros commented on a change in pull request #7785:
URL: https://github.com/apache/tvm/pull/7785#discussion_r621226575



##########
File path: src/target/source/codegen_source_base.h
##########
@@ -155,7 +156,8 @@ runtime::Module CSourceModuleCreate(const String& code, const String& fmt,
  */
 runtime::Module CreateMetadataModule(
     const std::unordered_map<std::string, runtime::NDArray>& params, runtime::Module target_module,
-    const Array<runtime::Module>& ext_modules, Target target);
+    const Array<runtime::Module>& ext_modules, Target target,

Review comment:
       Actually, doesn't seem a bad idea to remove the default and add an empty metadata in `compiler.cc`, so it is clear that we don't need compiled metadata in the vm executor




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] giuseros commented on a change in pull request #7785: [AOT] Introducing AOT in TVM

Posted by GitBox <gi...@apache.org>.

giuseros commented on a change in pull request #7785:
URL: https://github.com/apache/tvm/pull/7785#discussion_r611852863



##########
File path: include/tvm/runtime/crt/stack_memory.h
##########
@@ -0,0 +1,55 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+// LINT_C_FILE
+#ifndef TVM_RUNTIME_CRT_STACK_MEMORY_H_
+#define TVM_RUNTIME_CRT_STACK_MEMORY_H_
+#include <stddef.h>
+#include <stdint.h>
+
+#include "error_codes.h"
+
+/*! Memory alignment for allocator */
+
+#ifndef TVM_RUNTIME_ALLOC_ALIGNMENT
+#define TVM_RUNTIME_ALLOC_ALIGNMENT 16
+#endif
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+typedef struct {
+  uint8_t* next_alloc;   /** Pointer to the next block of bytes to allocate */

Review comment:
       Yes, I clarified this




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] giuseros commented on a change in pull request #7785: [AOT] Introducing AOT in TVM

Posted by GitBox <gi...@apache.org>.

giuseros commented on a change in pull request #7785:
URL: https://github.com/apache/tvm/pull/7785#discussion_r615005314



##########
File path: src/target/source/codegen_c_host.cc
##########
@@ -40,13 +40,16 @@ namespace codegen {
 
 CodeGenCHost::CodeGenCHost() { module_name_ = GetUniqueName("__tvm_module_ctx"); }
 
-void CodeGenCHost::Init(bool output_ssa, bool emit_asserts, std::string target_str) {
+void CodeGenCHost::Init(bool output_ssa, bool emit_asserts, bool is_aot_executor,
+                        std::string target_str) {

Review comment:
       Hi @areusch , 
   I am on board with 2., but since we are in agreement I would like to make 1. work. This is to have some better performance + code from day1. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] areusch commented on pull request #7785: [AOT] Introducing AOT in TVM

Posted by GitBox <gi...@apache.org>.

areusch commented on pull request #7785:
URL: https://github.com/apache/tvm/pull/7785#issuecomment-828660869

@giuseros i agree it wouldn't affect tuning logs, but we may need to keep `executor=` around for a long time if we add it to `relay.build`.

> Also, in general, why adding something to the target if we know we want to remove it entirely (and when it is so straightforward to express as a relay.build parameter)?

the problem in my mind is the precedent. here are some more things that could be relay.build parameters:
- `--link-params`
- `--runtime`
- parameters from @Mousius 's recent [RFC](https://discuss.tvm.apache.org/t/rfc-utvm-aot-optimisations-for-embedded-targets/9849)

if we add all of these as a top-level API, it's going to get difficult to manage. see my pushback on [PR 7823](https://github.com/apache/tvm/pull/7823). I'd rather continue with the single dirty namespace we have now (Target) rather than dirtying two.

I'm not sure I see a direct effect on `tvmc` with this decision--it will complicate things in the short term, but it could still accept `--executor` as a standalone argument, and modify `--target` as needed until the new `compiler_opts` kwarg is merged (presuming we go that route).

--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] tqchen commented on a change in pull request #7785: [AOT] Introducing AOT in TVM

Posted by GitBox <gi...@apache.org>.

tqchen commented on a change in pull request #7785:
URL: https://github.com/apache/tvm/pull/7785#discussion_r614865828



##########
File path: src/target/source/codegen_c_host.cc
##########
@@ -40,13 +40,16 @@ namespace codegen {
 
 CodeGenCHost::CodeGenCHost() { module_name_ = GetUniqueName("__tvm_module_ctx"); }
 
-void CodeGenCHost::Init(bool output_ssa, bool emit_asserts, std::string target_str) {
+void CodeGenCHost::Init(bool output_ssa, bool emit_asserts, bool is_aot_executor,
+                        std::string target_str) {

Review comment:
       to add on a bit on call_packed. We could introduce another series of `call_cpacked` intrinsics, which does not do string lookup. This way we do not need to infer the PackedFunc call variant in the flag(but directly reflect them in the IR)




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] areusch commented on a change in pull request #7785: [AOT] Introducing AOT in TVM

Posted by GitBox <gi...@apache.org>.

areusch commented on a change in pull request #7785:
URL: https://github.com/apache/tvm/pull/7785#discussion_r608904869



##########
File path: src/relay/backend/aot_codegen.cc
##########
@@ -0,0 +1,675 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+/*!
+ * \file relay/backend/graph_codegen.cc
+ * \brief Graph runtime codegen
+ */
+
+#include <dmlc/any.h>
+#include <tvm/ir/module.h>
+#include <tvm/relay/expr_functor.h>
+#include <tvm/runtime/device_api.h>
+#include <tvm/tir/builtin.h>
+#include <tvm/tir/expr.h>
+#include <tvm/tir/stmt.h>
+
+#include <algorithm>
+#include <list>
+#include <string>
+#include <vector>
+
+#include "../../runtime/meta_data.h"
+#include "compile_engine.h"
+#include "utils.h"
+
+namespace tvm {
+namespace relay {
+namespace backend {
+
+using IntegerArray = Array<Integer>;
+using ShapeVector = std::vector<std::vector<int64_t>>;
+using GraphAttrs = std::unordered_map<std::string, dmlc::any>;
+using TargetsMap = std::unordered_map<int, Target>;
+
+/*! \brief Lowered outputs */
+struct AOTLoweredOutput {
+  std::string graph_tir;
+  Map<String, IRModule> lowered_funcs;
+  Array<tvm::runtime::Module> external_mods;
+  std::unordered_map<std::string, std::pair<int, const tvm::runtime::NDArray>> params;
+  runtime::AOTMetadata aot_metadata;
+};
+
+class AotReturnSidVisitor : public ExprVisitor {
+ public:
+  explicit AotReturnSidVisitor(Map<Expr, Array<IntegerArray>> storage_device_map)
+      : storage_device_map_{storage_device_map}, return_sid_{-1} {}
+
+  IntegerArray FindReturnSid(Function func) {
+    VisitExpr(func->body);
+    return return_sid_;
+  }
+
+ protected:
+  void AssignReturnSid(Expr e) {
+    auto iter = storage_device_map_.find(e);
+    if (iter != storage_device_map_.end()) {
+      return_sid_ = (*iter).second[0];
+    }
+  }
+
+  void VisitExpr_(const ConstantNode* cn) override {
+    ExprVisitor::VisitExpr_(cn);
+    AssignReturnSid(GetRef<Expr>(cn));
+  }
+
+  void VisitExpr_(const VarNode* vn) override {
+    ExprVisitor::VisitExpr_(vn);
+    AssignReturnSid(GetRef<Expr>(vn));
+  }
+
+  void VisitExpr_(const CallNode* cn) override {
+    ExprVisitor::VisitExpr_(cn);
+    AssignReturnSid(GetRef<Expr>(cn));
+  }
+
+  void VisitExpr_(const LetNode* op) override { VisitExpr(op->body); }
+
+  void VisitExpr_(const TupleNode* tn) override {
+    ExprVisitor::VisitExpr_(tn);
+    AssignReturnSid(GetRef<Expr>(tn));
+  }
+
+ private:
+  Map<Expr, Array<IntegerArray>> storage_device_map_;
+  IntegerArray return_sid_;
+};
+
+using TIRNetwork = tvm::Array<tir::Stmt>;
+
+/*! \brief Code generator for graph runtime */
+class AOTCodegen : public ExprVisitor {
+ protected:
+  /*!
+   * \brief Utility function to allocate a DLTensor or TVMValue
+   * \param  type the type of allocation
+   * \param num the number of variable to allocate on the stack
+   * \return PrimExpr representing the allocated object
+   */
+  PrimExpr StackAlloca(std::string type, size_t num) {
+    Array<PrimExpr> args = {tir::StringImm(type), ConstInt32(num)};
+    return tir::Call(DataType::Handle(), tir::builtin::tvm_stack_alloca(), args);
+  }
+
+  /*!
+   * \brief Utility function to allocate memory for storage identifiers
+   * \param  memory_size_byte size in bytes of the allocation
+   * \return PrimExpr representing the allocated memory
+   */
+  PrimExpr AllocateBackendMemory(int memory_size_byte) {
+    // TODO(giuseros): use tir::Allocate instead of TVMBackendAllocWorkspace
+    // to enable unified memory planning
+    static const Op& op = Op::Get("tir.TVMBackendAllocWorkspace");
+    return tvm::tir::Call(DataType::Handle(), op, {1, 0, memory_size_byte, 2, 8});
+  }
+
+  /*!
+   * \brief Utility function to convert a concrete integer to a PrimExpr.
+   * \param num the number to convert
+   * \return PrimExpr representing num
+   */
+  inline PrimExpr ConstInt32(size_t num) {
+    ICHECK_LE(num, std::numeric_limits<int>::max());
+    return tir::make_const(DataType::Int(32), static_cast<int>(num));
+  }
+
+  /*!
+   * \brief Return a vector of variables that represents the sids for the given Relay Expr
+   */
+  std::vector<tir::Var> pack_sid(Expr expr) {
+    Array<IntegerArray> sids = storage_device_map_[expr];
+    std::vector<tir::Var> sid_vars;
+
+    // Note that an expression can have multiple sids associated with it
+    // e.g., returning multiple values from a function
+    for (const auto& sid : sids[0]) {
+      // Determine if an sid is an output buffer
+      int sid_int = static_cast<int>((sid.as<IntImmNode>())->value);
+      auto output_iter = std::find(return_sid_.begin(), return_sid_.end(), sid_int);
+      if (output_iter != return_sid_.end()) {
+        int output_index = std::distance(return_sid_.begin(), output_iter);
+        sid_vars.push_back(main_signature_[input_vars_.size() + output_index]);
+        continue;
+      }
+      // Pack the sid inside the TVMValue
+      auto sid_array = te::Var(make_string("sid_", sid, "_value"), DataType::Handle());
+      auto sid_value = sids_table_[sid];
+      tvm::PrimExpr set_tensor =
+          tvm::tir::Call(DataType::Handle(), tvm::tir::builtin::tvm_struct_set(),
+                         {sid_array, 0, tir::builtin::kArrData, sid_value});
+      stmts_.push_back(tir::LetStmt(sid_array, StackAlloca("array", 1), tir::Evaluate(set_tensor)));
+      sid_vars.push_back(sid_array);
+    }
+    return sid_vars;
+  }
+
+  /*!
+   * \brief Utility function to return a parameter associated with an expression
+   * \param expr Relay Expression assicated with the parameter
+   * \return Variable that represents the DLTensor associated with the parameters
+   */
+  tir::Var pack_param(Expr expr) {
+    // TODO(giuseros): Using call_extern to call into lookup_linked_param. This is because the

Review comment:
       oh I understand. let's see if Ziheng has commentary on the other thread.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] manupa-arm commented on a change in pull request #7785: [AOT] Introducing AOT in TVM

Posted by GitBox <gi...@apache.org>.

manupa-arm commented on a change in pull request #7785:
URL: https://github.com/apache/tvm/pull/7785#discussion_r615725747



##########
File path: src/relay/backend/build_module.cc
##########
@@ -59,17 +64,50 @@ struct BuildOutput {
  */
 struct GraphCodegen {
  public:
-  GraphCodegen() {
-    auto pf = GetPackedFunc("relay.build_module._GraphExecutorCodegen");
-    mod = (*pf)();
+  explicit GraphCodegen(Target target_host) : target_host_(target_host) {
+    const String executor_str =
+        target_host->GetAttr<String>("executor").value_or(kTvmExecutorGraph);
+    if (executor_str == kTvmExecutorGraph) {
+      executor_ = ExecutorType::Graph;
+      auto pf = GetPackedFunc("relay.build_module._GraphExecutorCodegen");
+      mod = (*pf)();
+    } else if (executor_str == kTvmExecutorAot) {

Review comment:
       I think we'd need a ExecutorCodegen bass class and specialization of GraphCodegen and AOTCodegen. The switch between the which functionality to run should be done in a higher-level in the build_module.cc. Therefore we can avoid the having to check which executor inside each of these functions. WDYT ?
   

##########
File path: src/relay/backend/aot_codegen.cc
##########
@@ -0,0 +1,704 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+/*!
+ * \file relay/backend/graph_codegen.cc
+ * \brief Graph runtime codegen
+ */
+
+#include <dmlc/any.h>
+#include <tvm/ir/module.h>
+#include <tvm/relay/expr_functor.h>
+#include <tvm/runtime/device_api.h>
+#include <tvm/tir/builtin.h>
+#include <tvm/tir/expr.h>
+#include <tvm/tir/stmt.h>
+
+#include <algorithm>
+#include <list>
+#include <string>
+#include <vector>
+
+#include "../../runtime/meta_data.h"
+#include "compile_engine.h"
+#include "utils.h"
+
+namespace tvm {
+namespace relay {
+namespace backend {
+
+using IntegerArray = Array<Integer>;
+using ShapeVector = std::vector<std::vector<int64_t>>;
+using GraphAttrs = std::unordered_map<std::string, dmlc::any>;
+using TargetsMap = std::unordered_map<int, Target>;
+
+/*! \brief Lowered outputs */
+struct AOTLoweredOutput {

Review comment:
       We should make this a sub-class of LoweredOutput that could be made common between Graph and AoT codegens. This is essentially the output code each of the executor codegens. This will help in having generic interfaces in creation of metadata module and passing around special attributes (e.g., runner_func, json) that are specific to each executor.

##########
File path: src/relay/backend/build_module.cc
##########
@@ -101,7 +139,18 @@ struct GraphCodegen {
     return ret;
   }
 
+  runtime::AOTMetadata GetAOTMetadata() {

Review comment:
        If the relevant information is available in LoweredOutput of AoTExecutorCodegen, we should not need to get this one seperately -- also look at my comment where this information is passed onto CreateMetadataModule(...)

##########
File path: src/relay/backend/build_module.cc
##########
@@ -43,12 +43,17 @@ namespace backend {
 using TargetsMap = Map<tvm::Integer, tvm::Target>;
 using namespace tvm::relay::transform;
 
+/*!
+ * Type of supported executors
+ */
+enum class ExecutorType { Graph, Aot };
+
 /*!
  * \brief Output of building module
- *
  */
 struct BuildOutput {
   std::string graph_json;
+  tir::PrimFunc runner_function;

Review comment:
       I am not exactly sure why we need the runner_function in the BuildOutput because I thought this only needed to be visible to MetadataModule as it will be pointed to by tvm_model_t.
   
   If we really need to have it, I think its better if we can have a parent class BuildOutput that contains just runtime.Module and params and specialize to contain either graph json or runner function. Its confusing to me to have a struct that has both graph_json and runner_function. WDYT ?

##########
File path: src/relay/backend/build_module.cc
##########
@@ -59,17 +64,50 @@ struct BuildOutput {
  */
 struct GraphCodegen {
  public:
-  GraphCodegen() {
-    auto pf = GetPackedFunc("relay.build_module._GraphExecutorCodegen");
-    mod = (*pf)();
+  explicit GraphCodegen(Target target_host) : target_host_(target_host) {
+    const String executor_str =
+        target_host->GetAttr<String>("executor").value_or(kTvmExecutorGraph);
+    if (executor_str == kTvmExecutorGraph) {
+      executor_ = ExecutorType::Graph;
+      auto pf = GetPackedFunc("relay.build_module._GraphExecutorCodegen");
+      mod = (*pf)();
+    } else if (executor_str == kTvmExecutorAot) {
+      executor_ = ExecutorType::Aot;
+      auto pf = GetPackedFunc("relay.build_module._GraphAOTCodegen");
+      mod = (*pf)();
+    } else {
+      LOG(FATAL) << "Executor " << executor_str << " not supported";
+    }
   }
   ~GraphCodegen() {}
 
-  void Init(runtime::Module* m, TargetsMap targets) { CallFunc("init", m, targets); }
+  void Init(runtime::Module* m, TargetsMap targets) {
+    if (executor_ == ExecutorType::Graph) {
+      CallFunc("init", m, targets);
+    } else if (executor_ == ExecutorType::Aot) {
+      CallFunc("init", m, targets, target_host_);
+    } else {
+      LOG(FATAL) << "Executor not supported";
+    }
+  }
 
   void Codegen(const Function& func) { CallFunc("codegen", func); }
 
-  std::string GetJSON() { return CallFunc<std::string>("get_graph_json", nullptr); }
+  std::string GetJSON() {
+    if (executor_ == ExecutorType::Graph) {
+      return CallFunc<std::string>("get_graph_json", nullptr);
+    } else {
+      return "";
+    }
+  }
+
+  tir::PrimFunc GetRunnerFunction() {

Review comment:
       This should only be implemented in AOTCodegen.

##########
File path: src/relay/backend/build_module.cc
##########
@@ -59,17 +64,50 @@ struct BuildOutput {
  */
 struct GraphCodegen {
  public:
-  GraphCodegen() {
-    auto pf = GetPackedFunc("relay.build_module._GraphExecutorCodegen");
-    mod = (*pf)();
+  explicit GraphCodegen(Target target_host) : target_host_(target_host) {
+    const String executor_str =
+        target_host->GetAttr<String>("executor").value_or(kTvmExecutorGraph);
+    if (executor_str == kTvmExecutorGraph) {
+      executor_ = ExecutorType::Graph;
+      auto pf = GetPackedFunc("relay.build_module._GraphExecutorCodegen");
+      mod = (*pf)();
+    } else if (executor_str == kTvmExecutorAot) {
+      executor_ = ExecutorType::Aot;
+      auto pf = GetPackedFunc("relay.build_module._GraphAOTCodegen");
+      mod = (*pf)();
+    } else {
+      LOG(FATAL) << "Executor " << executor_str << " not supported";
+    }
   }
   ~GraphCodegen() {}
 
-  void Init(runtime::Module* m, TargetsMap targets) { CallFunc("init", m, targets); }
+  void Init(runtime::Module* m, TargetsMap targets) {
+    if (executor_ == ExecutorType::Graph) {
+      CallFunc("init", m, targets);
+    } else if (executor_ == ExecutorType::Aot) {
+      CallFunc("init", m, targets, target_host_);
+    } else {
+      LOG(FATAL) << "Executor not supported";
+    }
+  }
 
   void Codegen(const Function& func) { CallFunc("codegen", func); }
 
-  std::string GetJSON() { return CallFunc<std::string>("get_graph_json", nullptr); }
+  std::string GetJSON() {
+    if (executor_ == ExecutorType::Graph) {
+      return CallFunc<std::string>("get_graph_json", nullptr);
+    } else {

Review comment:
       I think we can avoid these, if you agree to above comment

##########
File path: src/relay/backend/build_module.cc
##########
@@ -473,15 +540,17 @@ class RelayBuildModule : public runtime::ModuleNode {
 
     // Relay IRModule -> IRModule optimizations.
     relay_module = Optimize(relay_module, targets_, params);
+
     // Get the updated function.
     auto func = Downcast<Function>(relay_module->Lookup("main"));
 
     // Generate code for the updated function.
-    graph_codegen_ = std::unique_ptr<GraphCodegen>(new GraphCodegen());
+    graph_codegen_ = std::unique_ptr<GraphCodegen>(new GraphCodegen(target_host));

Review comment:
       We should not overload aot_codegen as graph_codegen -- IMHO, the switch to use which executor should be reflected somewhere at this source level.

##########
File path: src/relay/backend/aot_codegen.cc
##########
@@ -0,0 +1,704 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+/*!
+ * \file relay/backend/graph_codegen.cc
+ * \brief Graph runtime codegen
+ */
+
+#include <dmlc/any.h>
+#include <tvm/ir/module.h>
+#include <tvm/relay/expr_functor.h>
+#include <tvm/runtime/device_api.h>
+#include <tvm/tir/builtin.h>
+#include <tvm/tir/expr.h>
+#include <tvm/tir/stmt.h>
+
+#include <algorithm>
+#include <list>
+#include <string>
+#include <vector>
+
+#include "../../runtime/meta_data.h"
+#include "compile_engine.h"
+#include "utils.h"
+
+namespace tvm {
+namespace relay {
+namespace backend {
+
+using IntegerArray = Array<Integer>;
+using ShapeVector = std::vector<std::vector<int64_t>>;
+using GraphAttrs = std::unordered_map<std::string, dmlc::any>;
+using TargetsMap = std::unordered_map<int, Target>;
+
+/*! \brief Lowered outputs */
+struct AOTLoweredOutput {
+  tir::PrimFunc runner_func;
+  Map<String, IRModule> lowered_funcs;
+  Array<tvm::runtime::Module> external_mods;
+  std::unordered_map<std::string, std::pair<int, const tvm::runtime::NDArray>> params;
+  runtime::AOTMetadata aot_metadata;
+};
+
+class AotReturnSidVisitor : public ExprVisitor {
+ public:
+  explicit AotReturnSidVisitor(Map<Expr, Array<IntegerArray>> storage_device_map)
+      : storage_device_map_{storage_device_map}, return_sid_{-1} {}
+
+  IntegerArray FindReturnSid(Function func) {
+    VisitExpr(func->body);
+    return return_sid_;
+  }
+
+ protected:
+  void AssignReturnSid(Expr e) {
+    auto iter = storage_device_map_.find(e);
+    if (iter != storage_device_map_.end()) {
+      return_sid_ = (*iter).second[0];
+    }
+  }
+
+  void VisitExpr_(const ConstantNode* cn) override {
+    ExprVisitor::VisitExpr_(cn);
+    AssignReturnSid(GetRef<Expr>(cn));
+  }
+
+  void VisitExpr_(const VarNode* vn) override {
+    ExprVisitor::VisitExpr_(vn);
+    AssignReturnSid(GetRef<Expr>(vn));
+  }
+
+  void VisitExpr_(const CallNode* cn) override {
+    ExprVisitor::VisitExpr_(cn);
+    AssignReturnSid(GetRef<Expr>(cn));
+  }
+
+  void VisitExpr_(const LetNode* op) override { VisitExpr(op->body); }
+
+  void VisitExpr_(const TupleNode* tn) override {
+    ExprVisitor::VisitExpr_(tn);
+    AssignReturnSid(GetRef<Expr>(tn));
+  }
+
+ private:
+  Map<Expr, Array<IntegerArray>> storage_device_map_;
+  IntegerArray return_sid_;
+};
+
+/*! \brief Code generator for AOT executor */
+class AOTCodegen : public ExprVisitor {
+ protected:
+  /*!
+   * \brief Utility function to allocate a DLTensor or TVMValue
+   * \param  type the type of allocation
+   * \param num the number of variable to allocate on the stack
+   * \return PrimExpr representing the allocated object
+   */
+  PrimExpr StackAlloca(std::string type, size_t num) {
+    Array<PrimExpr> args = {tir::StringImm(type), ConstInt32(num)};
+    return tir::Call(DataType::Handle(), tir::builtin::tvm_stack_alloca(), args);
+  }
+
+  /*!
+   * \brief Utility function to allocate memory for storage identifiers
+   * \param  memory_size_byte size in bytes of the allocation
+   * \return PrimExpr representing the allocated memory
+   */
+  PrimExpr AllocateBackendMemory(int memory_size_byte) {
+    // TODO(giuseros): use tir::Allocate instead of TVMBackendAllocWorkspace

Review comment:
       @giuseros , if its not too much trouble lets move to tir.allocates -- I think its a very small change to require its own PR (if I am not mistaken)

##########
File path: src/relay/backend/build_module.cc
##########
@@ -524,7 +593,8 @@ class RelayBuildModule : public runtime::ModuleNode {
     }
 
     auto ext_mods = graph_codegen_->GetExternalModules();
-    ret_.mod = tvm::codegen::CreateMetadataModule(ret_.params, ret_.mod, ext_mods, GetTargetHost());
+    ret_.mod = tvm::codegen::CreateMetadataModule(ret_.params, ret_.mod, ext_mods, GetTargetHost(),
+                                                  graph_codegen_->GetAOTMetadata());

Review comment:
       It seems a bit ad-hoc to include just the AoTMetadata here. I'd suggest to include LoweredOutput here. Thus, that way inside CreateMetadataModule --> it will know how to consume the output of ExecutorCodegen to create correct metadata module. WDYT ?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] giuseros commented on a change in pull request #7785: [AOT] Introducing AOT in TVM

Posted by GitBox <gi...@apache.org>.

giuseros commented on a change in pull request #7785:
URL: https://github.com/apache/tvm/pull/7785#discussion_r614807239



##########
File path: src/target/source/codegen_c_host.cc
##########
@@ -40,13 +40,16 @@ namespace codegen {
 
 CodeGenCHost::CodeGenCHost() { module_name_ = GetUniqueName("__tvm_module_ctx"); }
 
-void CodeGenCHost::Init(bool output_ssa, bool emit_asserts, std::string target_str) {
+void CodeGenCHost::Init(bool output_ssa, bool emit_asserts, bool is_aot_executor,
+                        std::string target_str) {

Review comment:
       As I said on the discuss post, I agree about having a separate discussion on Memory management, PackedFunc and firmware-facing API. I would say that none of this is covered in this PR. In this PR, we have:




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] areusch commented on a change in pull request #7785: [AOT] Introducing AOT in TVM

Posted by GitBox <gi...@apache.org>.

areusch commented on a change in pull request #7785:
URL: https://github.com/apache/tvm/pull/7785#discussion_r623140515



##########
File path: tests/python/relay/aot/aot_test_utils.py
##########
@@ -108,11 +109,18 @@ def create_main(test_name, input_list, output_list, output_path):
         main_file.write("tvm_runtime_run(&network, inputs, outputs);")
 
         for i in range(0, len(output_list)):
+            is_real_dtype = output_list[i].dtype == "float32"

Review comment:
       maybe is_float_dtype, wasn't sure what real vs fake dtype was when I read this :)

##########
File path: src/relay/backend/aot_executor_codegen.cc
##########
@@ -0,0 +1,672 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+/*!
+ * \file relay/backend/graph_codegen.cc
+ * \brief Graph runtime codegen
+ */
+
+#include <dmlc/any.h>
+#include <tvm/ir/module.h>
+#include <tvm/relay/expr_functor.h>
+#include <tvm/runtime/device_api.h>
+#include <tvm/tir/builtin.h>
+#include <tvm/tir/expr.h>
+#include <tvm/tir/stmt.h>
+
+#include <algorithm>
+#include <list>
+#include <string>
+#include <vector>
+
+#include "compile_engine.h"
+#include "utils.h"
+
+namespace tvm {
+namespace relay {
+namespace backend {
+
+using IntegerArray = Array<Integer>;
+using TargetsMap = std::unordered_map<int, Target>;
+
+class AotReturnSidVisitor : public ExprVisitor {
+ public:
+  explicit AotReturnSidVisitor(Map<Expr, Array<IntegerArray>> storage_device_map)
+      : storage_device_map_{storage_device_map}, return_sid_{-1} {}
+
+  IntegerArray FindReturnSid(Function func) {
+    VisitExpr(func->body);
+    return return_sid_;
+  }
+
+ protected:
+  void AssignReturnSid(Expr e) {
+    auto iter = storage_device_map_.find(e);
+    if (iter != storage_device_map_.end()) {
+      return_sid_ = (*iter).second[0];
+    }
+  }
+
+  void VisitExpr_(const ConstantNode* cn) override {
+    ExprVisitor::VisitExpr_(cn);
+    AssignReturnSid(GetRef<Expr>(cn));
+  }
+
+  void VisitExpr_(const VarNode* vn) override {
+    ExprVisitor::VisitExpr_(vn);
+    AssignReturnSid(GetRef<Expr>(vn));
+  }
+
+  void VisitExpr_(const CallNode* cn) override {
+    ExprVisitor::VisitExpr_(cn);
+    AssignReturnSid(GetRef<Expr>(cn));
+  }
+
+  void VisitExpr_(const LetNode* op) override { VisitExpr(op->body); }
+
+  void VisitExpr_(const TupleNode* tn) override {
+    ExprVisitor::VisitExpr_(tn);
+    AssignReturnSid(GetRef<Expr>(tn));
+  }
+
+ private:
+  Map<Expr, Array<IntegerArray>> storage_device_map_;
+  IntegerArray return_sid_;
+};
+
+/*! \brief Code generator for AOT executor */
+class AOTExecutorCodegen : public ExprVisitor {
+ protected:
+  /*!
+   * \brief Utility function to allocate a DLTensor or TVMValue
+   * \param  type the type of allocation
+   * \param num the number of variable to allocate on the stack
+   * \return PrimExpr representing the allocated object
+   */
+  PrimExpr StackAlloca(std::string type, size_t num) {
+    Array<PrimExpr> args = {tir::StringImm(type), ConstInt32(num)};
+    return tir::Call(DataType::Handle(), tir::builtin::tvm_stack_alloca(), args);
+  }
+
+  /*!
+   * \brief Utility function to convert a concrete integer to a PrimExpr.
+   * \param num the number to convert
+   * \return PrimExpr representing num
+   */
+  inline PrimExpr ConstInt32(size_t num) {
+    ICHECK_LE(num, std::numeric_limits<int>::max());
+    return tir::make_const(DataType::Int(32), static_cast<int>(num));
+  }
+
+  /*!
+   * \brief Return a vector of variables that represents the sids for the given Relay Expr
+   */
+  std::vector<tir::Var> PackSid(Expr expr) {
+    Array<IntegerArray> sids = storage_device_map_[expr];
+    std::vector<tir::Var> sid_vars;
+
+    // Note that an expression can have multiple sids associated with it
+    // e.g., returning multiple values from a function
+    for (const auto& sid : sids[0]) {
+      // Determine if an sid is an output buffer
+      int sid_int = static_cast<int>((sid.as<IntImmNode>())->value);
+      auto output_iter = std::find(return_sid_.begin(), return_sid_.end(), sid_int);
+      if (output_iter != return_sid_.end()) {
+        int output_index = std::distance(return_sid_.begin(), output_iter);
+        sid_vars.push_back(main_signature_[input_vars_.size() + output_index]);
+        continue;
+      }
+      // Pack the sid inside the TVMValue
+      auto sid_array = te::Var(MakeString("sid_", sid, "_value"), DataType::Handle());
+      auto sid_value = sids_table_[sid];
+      tvm::PrimExpr set_tensor =
+          tvm::tir::Call(DataType::Handle(), tvm::tir::builtin::tvm_struct_set(),
+                         {sid_array, 0, tir::builtin::kArrData, sid_value});
+      stmts_.push_back(tir::LetStmt(sid_array, StackAlloca("array", 1), tir::Evaluate(set_tensor)));
+      sid_vars.push_back(sid_array);
+    }
+    return sid_vars;
+  }
+
+  /*!
+   * \brief Utility function to return a parameter associated with an expression
+   * \param expr Relay Expression assicated with the parameter
+   * \return Variable that represents the DLTensor associated with the parameters
+   */
+  tir::Var PackParam(Expr expr) {
+    // TODO(giuseros): Using call_extern to call into lookup_linked_param. This is because the
+    // builtin::ret is not supported yet in the c target. Once return is supported we can use
+    // tvm_call_packed_lowered().
+    int param_sid = param_storage_ids_[params_by_expr_[expr]];
+    auto lookup_linked_param_fn = tir::StringImm(::tvm::runtime::symbol::tvm_lookup_linked_param);
+    auto param_array = te::Var(MakeString("param_", param_sid, "_array"), DataType::Handle());
+
+    // Compose the lookup_call using a local stack
+    Array<tir::Stmt> lookup_call;
+    auto param_var = te::Var(MakeString("param_", param_sid, "_value"), DataType::Handle());
+    auto ret_var = te::Var("ret_value", DataType::Handle());
+    auto ret_code = te::Var("ret_value", DataType::Handle());
+
+    lookup_call.push_back(tir::Evaluate(
+        tvm::tir::Call(DataType::Handle(), tvm::tir::builtin::tvm_struct_set(),
+                       {param_var, 0, tir::builtin::kTVMValueContent, ConstInt32(param_sid)})));
+    lookup_call.push_back(tir::Evaluate(
+        tvm::tir::Call(DataType::Handle(), tir::builtin::call_extern(),
+                       {lookup_linked_param_fn, param_var, 0, 0, ret_var, ret_code, 0})));
+    auto ret_var_handle = tvm::tir::Call(DataType::Handle(), tvm::tir::builtin::tvm_struct_get(),
+                                         {ret_var, 0, tir::builtin::kTVMValueContent});
+
+    // Set the param to the value returned by lookup_call
+    tvm::PrimExpr set_param_array =
+        tvm::tir::Call(DataType::Handle(), tvm::tir::builtin::tvm_struct_set(),
+                       {param_array, 0, tir::builtin::kArrData, ret_var_handle});
+    lookup_call.push_back(tir::Evaluate(set_param_array));
+
+    tir::Stmt lookup_body = tir::SeqStmt(lookup_call);
+
+    // Allocate the DLTensors on the stack
+    lookup_body = tir::LetStmt(param_var, StackAlloca("arg_value", 1), lookup_body);
+    lookup_body = tir::LetStmt(ret_var, StackAlloca("arg_value", 1), lookup_body);
+    lookup_body = tir::LetStmt(ret_code, StackAlloca("arg_value", 1), lookup_body);
+    lookup_body = tir::LetStmt(param_array, StackAlloca("arg_value", 1), lookup_body);
+    stmts_.push_back(lookup_body);
+    return param_array;
+  }
+
+  /*!
+   * brief Given an expression return the variable(s) associated with that expression
+   */
+  std::vector<te::Var> FindExpr(Expr arg) {
+    auto input_iter = std::find(input_vars_.begin(), input_vars_.end(), arg);
+    if (input_iter != input_vars_.end()) {
+      // Input variable
+      int main_index = std::distance(input_vars_.begin(), input_iter);
+      return {main_signature_[main_index]};
+    } else if (params_by_expr_.find(arg) != params_by_expr_.end()) {
+      // Parameter of the network
+      return {PackParam(arg)};
+    } else {
+      // Storage identifier (i.e., intermediate memory)
+      return PackSid(arg);
+    }
+  }
+
+  /*!
+   * brief Call a function with a given name
+   */
+  void CreateFuncCall(Call call, std::string func_name) {
+    tvm::Array<PrimExpr> args{tvm::tir::StringImm(func_name)};
+    std::vector<tir::Stmt> CreateFuncCall_stmts;

Review comment:
       nit: create_func_call_stmts https://google.github.io/styleguide/cppguide.html#Variable_Names

##########
File path: tests/crt/aot_memory_test.cc
##########
@@ -28,17 +28,20 @@ TEST(AOTMemory, Allocate) {
   tvm_workspace_t tvm_runtime_workspace;
 
   StackMemoryManager_Init(&tvm_runtime_workspace, model_memory, 96);
-
-  void* block_one = StackMemoryManager_Allocate(&tvm_runtime_workspace, 1);
+  void* block_one = NULL;
+  StackMemoryManager_Allocate(&tvm_runtime_workspace, 1, &block_one);

Review comment:
       can you assert on the return code too?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] areusch commented on a change in pull request #7785: [AOT] Introducing AOT in TVM

Posted by GitBox <gi...@apache.org>.

areusch commented on a change in pull request #7785:
URL: https://github.com/apache/tvm/pull/7785#discussion_r610796796



##########
File path: include/tvm/runtime/crt/aot_executor.h
##########
@@ -68,6 +67,9 @@ extern "C" {
 typedef struct {
 } tvm_context_t;
 
+typedef int32_t(tvm_function_t)(void* args, void* arg_type_ids, int32_t num_args,

Review comment:
       can you use `TVMBackendPackedCFunc`?

##########
File path: include/tvm/target/target_kind.h
##########
@@ -140,6 +140,12 @@ static constexpr const char* kTvmRuntimeCpp = "c++";
 /*! \brief Value used with --runtime in target specs to indicate the C runtime. */
 static constexpr const char* kTvmRuntimeCrt = "c";
 
+/*! \brief Value used with --executor in target specs to indicate the graph executor. */
+static constexpr const char* kTvmExecutorGraph = "graph";

Review comment:
       @jroesch @tqchen can you look at this interface and see if you agree?

##########
File path: python/tvm/relay/backend/executor_factory.py
##########
@@ -0,0 +1,216 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+"""Graph executor factory."""
+import warnings
+from ..._ffi.base import string_types
+from ..._ffi.registry import get_global_func
+from ...runtime import ndarray
+from tvm import tir
+
+
+class ExecutorFactoryModule:
+    """Graph executor factory module.
+    This is a module of graph executor factory
+
+    Parameters
+    ----------
+    graph_str : str
+        Depending on executor:
+        * Graph executor: the graph to be deployed in json format output by graph compiler.
+        The graph can contain operator(tvm_op) that points to the name of
+        PackedFunc in the libmod.
+        * AOT executor: the string representation of the TIR executor PrimFunction
+    target : tvm.Target
+        The Target used to build this module.
+    libmod : tvm.Module
+        The module of the corresponding function
+    libmod_name: str
+        The name of module
+    params : dict of str to NDArray
+        The parameters of module
+    """
+
+    def get_internal_repr(self):

Review comment:
       can you document these since it is now an interface and raise NotImplementedError? If these are required to be implemented, can you make this class use abc.ABCMeta and decorate these functions as abc.abstractmethod?

##########
File path: python/tvm/relay/build_module.py
##########
@@ -197,6 +201,9 @@ def get_params(self):
             ret[key] = value.data
         return ret
 
+    def get_executor(self):

Review comment:
       add a docstring

##########
File path: src/runtime/crt/common/aot_backend_api.c
##########
@@ -0,0 +1,59 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+// LINT_C_FILE
+
+#include <assert.h>
+#include <inttypes.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+#include <tvm/runtime/c_backend_api.h>
+#include <tvm/runtime/crt/error_codes.h>
+#include <tvm/runtime/crt/logging.h>
+#include <tvm/runtime/crt/platform.h>
+
+#include "crt_config.h"
+
+void* TVMBackendAllocWorkspace(int device_type, int device_id, uint64_t nbytes, int dtype_code_hint,

Review comment:
       I feel like these implementations are basically identical to those in `src/runtime/common/crt_backend_api.c`. I prefer not to introduce a copy of these functions and rather split the library `src/runtime/common` into e.g. `src/runtime/crt/backend` and `src/runtime/crt/runtime`.

##########
File path: python/tvm/relay/build_module.py
##########
@@ -245,8 +252,10 @@ def build(ir_mod, target=None, target_host=None, params=None, mod_name="default"
 
     Returns
     -------
-    graph_json : str
-        The json string that can be accepted by graph executor.
+    graph : str

Review comment:
       let's keep graph_json

##########
File path: python/tvm/micro/model_library_format.py
##########
@@ -132,14 +135,25 @@ def export_model_library_format(mod: graph_executor_factory.GraphExecutorFactory
         Path to the .tar archive to generate.
     """
     tempdir = utils.tempdir()
+    is_aot = False
+    for v in mod.target.values():
+        if v.attrs.get("executor", "graph_runtime") == "aot":
+            is_aot = True
+            break
+
+    runtime = ["graph"]
+    if is_aot:
+        runtime = ["aot"]
+
     metadata = {
         "version": 1,
         "model_name": mod.libmod_name,
         "export_datetime": datetime.datetime.now().strftime("%Y-%m-%d %H:%M:%SZ"),
-        "memory": _build_memory_map(mod.graph_json),
+        "memory": _build_memory_map(mod.graph),
         "target": {int(k): str(v) for k, v in mod.target.items()},
-        "runtimes": ["graph"],
+        "runtimes": runtime,

Review comment:
       yeah it looks like we missed a rename--it should be executors. Let's do this in another PR, I think we may need to rev the MLF version

##########
File path: python/tvm/relay/backend/executor_factory.py
##########
@@ -0,0 +1,216 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+"""Graph executor factory."""
+import warnings
+from ..._ffi.base import string_types
+from ..._ffi.registry import get_global_func
+from ...runtime import ndarray
+from tvm import tir
+
+
+class ExecutorFactoryModule:
+    """Graph executor factory module.

Review comment:
       clarify the comment

##########
File path: python/tvm/relay/backend/executor_factory.py
##########
@@ -0,0 +1,216 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+"""Graph executor factory."""

Review comment:
       can you clarify the comment?

##########
File path: src/runtime/crt/memory/stack_memory.c
##########
@@ -0,0 +1,47 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+// LINT_C_FILE
+
+#include <tvm/runtime/crt/stack_memory.h>
+
+void* MemoryManager_Allocate(tvm_workspace_t *tvm_runtime_workspace, int32_t nbytes) {

Review comment:
       Should this be StackMemoryManager_Allocate?

##########
File path: python/tvm/relay/backend/executor_factory.py
##########
@@ -0,0 +1,216 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+"""Graph executor factory."""
+import warnings
+from ..._ffi.base import string_types
+from ..._ffi.registry import get_global_func
+from ...runtime import ndarray
+from tvm import tir
+
+
+class ExecutorFactoryModule:
+    """Graph executor factory module.
+    This is a module of graph executor factory
+
+    Parameters
+    ----------
+    graph_str : str
+        Depending on executor:
+        * Graph executor: the graph to be deployed in json format output by graph compiler.
+        The graph can contain operator(tvm_op) that points to the name of
+        PackedFunc in the libmod.
+        * AOT executor: the string representation of the TIR executor PrimFunction
+    target : tvm.Target
+        The Target used to build this module.
+    libmod : tvm.Module
+        The module of the corresponding function
+    libmod_name: str
+        The name of module
+    params : dict of str to NDArray
+        The parameters of module
+    """
+
+    def get_internal_repr(self):
+        return self.internal_repr
+
+    def get_params(self):
+        return None
+
+    def get_lib(self):
+        return None
+
+    def __getitem__(self, item):
+        return None
+
+    def __iter__(self):
+        warnings.warn(
+            "legacy graph executor behavior of producing json / lib / params will be "
+            "removed in the next release."
+            " Please see documents of tvm.contrib.graph_executor.GraphModule for the "
+            " new recommended usage.",
+            DeprecationWarning,
+            2,
+        )
+        return self
+
+    def save_config(self, config_path):
+        pass
+
+    def __next__(self):
+        if self.iter_cnt > 2:
+            raise StopIteration
+
+        objs = [self.internal_repr, self.lib, self.params]
+        obj = objs[self.iter_cnt]
+        self.iter_cnt += 1
+        return obj
+
+
+class AOTExecutorFactoryModule(ExecutorFactoryModule):
+    """Graph executor factory module.
+    This is a module of graph executor factory
+
+    Parameters
+    ----------
+    graph_str : str
+        Depending on executor:
+        * Graph executor: the graph to be deployed in json format output by graph compiler.
+        The graph can contain operator(tvm_op) that points to the name of
+        PackedFunc in the libmod.
+        * AOT executor: the string representation of the TIR executor PrimFunction
+    target : tvm.Target
+        The Target used to build this module.
+    libmod : tvm.Module
+        The module of the corresponding function
+    libmod_name: str
+        The name of module
+    params : dict of str to NDArray
+        The parameters of module
+    """
+
+    def __init__(self, ir_mod, target, runner_function, libmod, libmod_name, params):
+        assert isinstance(runner_function, tir.PrimFunc)
+        args = []
+        for k, v in params.items():
+            args.append(k)
+            args.append(ndarray.array(v))
+
+        self.ir_mod = ir_mod
+        self.target = target
+        self.internal_repr = runner_function
+        self.lib = libmod
+        self.libmod_name = libmod_name
+        self.params = params
+        self.iter_cnt = 0
+
+    # Sometimes we want to get params explicitly.
+    # For example, we want to save its params value to
+    # an independent file.
+    def get_params(self):
+        return self.params
+
+    def get_runner_function(self):
+        return self.internal_repr
+
+    def get_lib(self):
+        return self.lib
+
+    def __getitem__(self, item):
+        return self.module.__getitem__(item)
+
+    def __iter__(self):

Review comment:
       seems like this and `__next__` will fall over in this case? can you add some tests or delete?

##########
File path: tests/python/relay/aot/infra.py
##########
@@ -83,7 +84,23 @@ def create_main(test_name, input_list, output_list, output_path):
             main_file.write('#include "output_data%i.h"\n' % i)
 
         main_file.write("extern tvm_model_t network;\n")
-        main_file.write("extern tvm_workspace_t *tvm_runtime_workspace;\n")
+        main_file.write("tvm_workspace_t app_workspace;\n")
+        main_file.write(
+            """
+tvm_crt_error_t TVMPlatformMemoryAllocate(size_t num_bytes, DLDevice dev, void** out_ptr){

Review comment:
       nit: space between `) {` here and on 94

##########
File path: python/tvm/relay/backend/executor_factory.py
##########
@@ -0,0 +1,216 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+"""Graph executor factory."""
+import warnings
+from ..._ffi.base import string_types
+from ..._ffi.registry import get_global_func
+from ...runtime import ndarray
+from tvm import tir
+
+
+class ExecutorFactoryModule:
+    """Graph executor factory module.
+    This is a module of graph executor factory
+
+    Parameters
+    ----------
+    graph_str : str
+        Depending on executor:
+        * Graph executor: the graph to be deployed in json format output by graph compiler.
+        The graph can contain operator(tvm_op) that points to the name of
+        PackedFunc in the libmod.
+        * AOT executor: the string representation of the TIR executor PrimFunction
+    target : tvm.Target
+        The Target used to build this module.
+    libmod : tvm.Module
+        The module of the corresponding function
+    libmod_name: str
+        The name of module
+    params : dict of str to NDArray
+        The parameters of module
+    """
+
+    def get_internal_repr(self):
+        return self.internal_repr
+
+    def get_params(self):
+        return None
+
+    def get_lib(self):
+        return None
+
+    def __getitem__(self, item):
+        return None
+
+    def __iter__(self):
+        warnings.warn(
+            "legacy graph executor behavior of producing json / lib / params will be "
+            "removed in the next release."
+            " Please see documents of tvm.contrib.graph_executor.GraphModule for the "
+            " new recommended usage.",
+            DeprecationWarning,
+            2,
+        )
+        return self
+
+    def save_config(self, config_path):
+        pass
+
+    def __next__(self):
+        if self.iter_cnt > 2:
+            raise StopIteration
+
+        objs = [self.internal_repr, self.lib, self.params]
+        obj = objs[self.iter_cnt]
+        self.iter_cnt += 1
+        return obj
+
+
+class AOTExecutorFactoryModule(ExecutorFactoryModule):
+    """Graph executor factory module.

Review comment:
       also clarify the comment

##########
File path: python/tvm/micro/model_library_format.py
##########
@@ -73,7 +73,7 @@ def _populate_codegen_dir(mod, codegen_dir: str):
         dso_mod.save(file_name)
 
 
-def _build_memory_map(graph_json):
+def _build_memory_map(graph_str):

Review comment:
       can you revert these changes?

##########
File path: include/tvm/runtime/crt/stack_memory.h
##########
@@ -0,0 +1,55 @@
+/*

Review comment:
       let's rename the other `include/tvm/runtime/crt/memory.h` to qualify it. `include/tvm/runtime/crt/page_memory.h`? or, `include/tvm/runtime/crt/memory/page_allocator.h` and equivalent `stack_allocator.h`?

##########
File path: python/tvm/relay/build_module.py
##########
@@ -289,11 +296,17 @@ def build(ir_mod, target=None, target_host=None, params=None, mod_name="default"
 
     with tophub_context:
         bld_mod = BuildModule()
+        runtime_repr, runtime_mod, params = bld_mod.build(mod=ir_mod, target=target, params=params)
+
+        if bld_mod.get_executor() == "aot":
+            executor_factory = _executor_factory.AOTExecutorFactoryModule(
+                ir_mod, target, runtime_repr, runtime_mod, mod_name, params
+            )
+        else:

Review comment:
       add an `else if ... == "graph":` and an `else: assert False, <error message>`

##########
File path: src/relay/backend/build_module.cc
##########
@@ -59,17 +62,50 @@ struct BuildOutput {
  */
 struct GraphCodegen {
  public:
-  GraphCodegen() {
-    auto pf = GetPackedFunc("relay.build_module._GraphExecutorCodegen");
-    mod = (*pf)();
+  explicit GraphCodegen(Target target_host) : target_host_(target_host) {
+    const String executor_str =
+        target_host->GetAttr<String>("executor").value_or(kTvmExecutorGraph);
+    if (executor_str == kTvmExecutorGraph) {
+      executor_ = Executor::Graph;
+      auto pf = GetPackedFunc("relay.build_module._GraphExecutorCodegen");
+      mod = (*pf)();
+    } else if (executor_str == kTvmExecutorAot) {
+      executor_ = Executor::Aot;
+      auto pf = GetPackedFunc("relay.build_module._GraphAOTCodegen");
+      mod = (*pf)();
+    } else {
+      LOG(FATAL) << "Executor not supported";

Review comment:
       can you print the executor `<< "Executor " << executor_str << " not supported" here and below

##########
File path: include/tvm/runtime/crt/stack_memory.h
##########
@@ -0,0 +1,55 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+// LINT_C_FILE
+#ifndef TVM_RUNTIME_CRT_STACK_MEMORY_H_
+#define TVM_RUNTIME_CRT_STACK_MEMORY_H_
+#include <stddef.h>
+#include <stdint.h>
+
+#include "error_codes.h"
+
+/*! Memory alignment for allocator */
+
+#ifndef TVM_RUNTIME_ALLOC_ALIGNMENT
+#define TVM_RUNTIME_ALLOC_ALIGNMENT 16
+#endif
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+typedef struct {
+  uint8_t* next_alloc;   /** Pointer to the next block of bytes to allocate */

Review comment:
       is it a block of bytes because of alignment?

##########
File path: src/relay/backend/build_module.cc
##########
@@ -43,12 +43,15 @@ namespace backend {
 using TargetsMap = Map<tvm::Integer, tvm::Target>;
 using namespace tvm::relay::transform;
 
+enum class Executor { Graph, Aot };

Review comment:
       can you add a docstring? Also, let's name this ExecutorType or something, since Executor sounds like an interface name.

##########
File path: src/relay/backend/build_module.cc
##########
@@ -43,12 +43,15 @@ namespace backend {
 using TargetsMap = Map<tvm::Integer, tvm::Target>;
 using namespace tvm::relay::transform;
 
+enum class Executor { Graph, Aot };
+
 /*!
  * \brief Output of building module
  *
  */
 struct BuildOutput {
-  std::string graph_json;
+  std::string graph;

Review comment:
       keep with graph_json

##########
File path: include/tvm/runtime/crt/aot_executor.h
##########
@@ -0,0 +1,98 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+/*!
+ * \file include/tvm/runtime/crt/aot/tvm_executor.h
+ * \brief TVM Executor for the Ahead-of-Time Runtime
+ *
+ * AOT models are described by the TVM model descriptor format
+ * which can be passed to tvm_runtime_run. These descriptors will be
+ * generated by the AOT compilation process. This can optionally be
+ * augmented with platform specific context to be passed to the TVM
+ * operators.
+ *
+ * Example:
+ * extern tvm_model_t my_network;
+ * int main() {
+ *    void* data = get_data();
+ *    void* output[4] = {0, 0, 0, 0};
+ *    void* inputs = {data};
+ *    void* outputs = {output};
+ *    tvm_context_t my_context = {
+ *      .driver = ...;
+ *    };
+ *    tvm_runtime_run(
+ *      &my_network,
+ *      inputs,
+ *      outputs
+ *      &my_context
+ *    );
+ *    return 0;
+ * }
+ */
+
+#ifndef TVM_RUNTIME_CRT_AOT_TVM_EXECUTOR_H_
+#define TVM_RUNTIME_CRT_AOT_TVM_EXECUTOR_H_
+
+#include <stdint.h>
+
+#include "error_codes.h"
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+/*!
+ * \brief Context information for future integrations
+ *  which is passed through to the operators.
+ *
+ * \note Can be used for drivers and platform specific information.
+ */
+typedef struct {
+} tvm_context_t;
+
+typedef int32_t(tvm_function_t)(void* args, void* arg_type_ids, int32_t num_args,
+                                void* out_ret_value, void* out_ret_tcode, void* resource_handle);
+
+/*!
+ * \brief TVM Model descriptor to describe the
+ *  model to the runtime.
+ */
+typedef struct {
+  uint32_t num_input_tensors;  /** Number of expected input tensors */
+  uint32_t num_output_tensors; /** Number of expected output tensors */
+  tvm_function_t* run_func;    /** Generated model function, called through tvm_runtime_run */
+} tvm_model_t;
+
+/*!
+ * \brief Main entry point for

Review comment:
       can you complete the docstring?

##########
File path: python/tvm/relay/build_module.py
##########
@@ -181,9 +185,9 @@ def optimize(self, mod, target=None, params=None):
     def _set_params(self, params):
         self._set_params_func(_convert_param_map(params))
 
-    def get_json(self):
+    def get_graph(self):

Review comment:
       can you revert this change?

##########
File path: include/tvm/runtime/crt/stack_memory.h
##########
@@ -0,0 +1,55 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+// LINT_C_FILE
+#ifndef TVM_RUNTIME_CRT_STACK_MEMORY_H_
+#define TVM_RUNTIME_CRT_STACK_MEMORY_H_
+#include <stddef.h>
+#include <stdint.h>
+
+#include "error_codes.h"

Review comment:
       I think this should be #include <tvm/runtime/crt/error_codes.h>

##########
File path: python/tvm/driver/tvmc/compiler.py
##########
@@ -239,7 +239,7 @@ def compile_model(
 
     # TODO we need to update this return to use the updated graph module APIs
     #      as these getter functions will be deprecated in the next release (@leandron)
-    return graph_module.get_json(), graph_module.get_lib(), graph_module.get_params(), dumps
+    return graph_module.get_graph(), graph_module.get_lib(), graph_module.get_params(), dumps

Review comment:
       can you revert this change?

##########
File path: tests/python/unittest/test_runtime_module_based_interface.py
##########
@@ -526,11 +526,11 @@ def test_debug_graph_executor():
     out = get_output(0).asnumpy()
     tvm.testing.assert_allclose(out, verify(data), atol=1e-5)
 
-    # debug graph executor wrapper
-    debug_g_mod = debug_executor.GraphModuleDebug(
-        complied_graph_lib["debug_create"]("default", dev),
-        [dev],
-        complied_graph_lib.get_json(),
+    # debug graph runtime wrapper
+    debug_g_mod = debug_runtime.GraphModuleDebug(

Review comment:
       revert this change

##########
File path: python/tvm/relay/backend/executor_factory.py
##########
@@ -0,0 +1,216 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+"""Graph executor factory."""
+import warnings
+from ..._ffi.base import string_types
+from ..._ffi.registry import get_global_func
+from ...runtime import ndarray
+from tvm import tir
+
+
+class ExecutorFactoryModule:
+    """Graph executor factory module.
+    This is a module of graph executor factory
+
+    Parameters
+    ----------
+    graph_str : str
+        Depending on executor:
+        * Graph executor: the graph to be deployed in json format output by graph compiler.
+        The graph can contain operator(tvm_op) that points to the name of
+        PackedFunc in the libmod.
+        * AOT executor: the string representation of the TIR executor PrimFunction
+    target : tvm.Target
+        The Target used to build this module.
+    libmod : tvm.Module
+        The module of the corresponding function
+    libmod_name: str
+        The name of module
+    params : dict of str to NDArray
+        The parameters of module
+    """
+
+    def get_internal_repr(self):
+        return self.internal_repr
+
+    def get_params(self):
+        return None
+
+    def get_lib(self):
+        return None
+
+    def __getitem__(self, item):
+        return None
+
+    def __iter__(self):
+        warnings.warn(
+            "legacy graph executor behavior of producing json / lib / params will be "
+            "removed in the next release."
+            " Please see documents of tvm.contrib.graph_executor.GraphModule for the "
+            " new recommended usage.",
+            DeprecationWarning,
+            2,
+        )
+        return self
+
+    def save_config(self, config_path):

Review comment:
       i'd vote for us to qualify config: `save_executor_config`. Also, let's have this function either return a string containing the config or accept an io.BytesIO so that it's not required to write to a file.

##########
File path: include/tvm/runtime/crt/aot_executor.h
##########
@@ -0,0 +1,98 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+/*!
+ * \file include/tvm/runtime/crt/aot/tvm_executor.h
+ * \brief TVM Executor for the Ahead-of-Time Runtime
+ *
+ * AOT models are described by the TVM model descriptor format
+ * which can be passed to tvm_runtime_run. These descriptors will be
+ * generated by the AOT compilation process. This can optionally be
+ * augmented with platform specific context to be passed to the TVM
+ * operators.
+ *
+ * Example:
+ * extern tvm_model_t my_network;
+ * int main() {
+ *    void* data = get_data();
+ *    void* output[4] = {0, 0, 0, 0};
+ *    void* inputs = {data};
+ *    void* outputs = {output};
+ *    tvm_context_t my_context = {
+ *      .driver = ...;
+ *    };
+ *    tvm_runtime_run(
+ *      &my_network,
+ *      inputs,
+ *      outputs
+ *      &my_context
+ *    );
+ *    return 0;
+ * }
+ */
+
+#ifndef TVM_RUNTIME_CRT_AOT_TVM_EXECUTOR_H_
+#define TVM_RUNTIME_CRT_AOT_TVM_EXECUTOR_H_
+
+#include <stdint.h>
+
+#include "error_codes.h"
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+/*!
+ * \brief Context information for future integrations
+ *  which is passed through to the operators.
+ *
+ * \note Can be used for drivers and platform specific information.
+ */
+typedef struct {

Review comment:
       GH has decided the past comment is out of date :(
   
   I would prefer if we can merge this structure in a follow-on PR where it's used, if that's ok with you.

##########
File path: src/relay/backend/build_module.cc
##########
@@ -166,6 +216,14 @@ class RelayBuildModule : public runtime::ModuleNode {
         ICHECK_EQ(args.num_args, 2);
         *rv = this->Optimize(args[0], args[1], this->params_);
       });
+    } else if (name == "get_executor") {

Review comment:
       let's update this function name to match the name we choose for Executor enum above




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] tqchen commented on pull request #7785: [AOT] Introducing AOT in TVM

Posted by GitBox <gi...@apache.org>.

tqchen commented on pull request #7785:
URL: https://github.com/apache/tvm/pull/7785#issuecomment-830617762


   Added one more comment. I dismissed my original requests as they are fulfilled. Didn't take a chance to do a full read, but will leave to @areusch to manage the PR


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] areusch commented on a change in pull request #7785: [AOT] Introducing AOT in TVM

Posted by GitBox <gi...@apache.org>.

areusch commented on a change in pull request #7785:
URL: https://github.com/apache/tvm/pull/7785#discussion_r617667974



##########
File path: src/relay/backend/build_module.cc
##########
@@ -473,23 +517,25 @@ class RelayBuildModule : public runtime::ModuleNode {
 
     // Relay IRModule -> IRModule optimizations.
     relay_module = Optimize(relay_module, targets_, params);
+
     // Get the updated function.
     auto func = Downcast<Function>(relay_module->Lookup("main"));
 
     // Generate code for the updated function.
-    graph_codegen_ = std::unique_ptr<GraphCodegen>(new GraphCodegen());
-    graph_codegen_->Init(nullptr, targets_);
-    graph_codegen_->Codegen(func);
-
-    ret_.graph_json = graph_codegen_->GetJSON();
-    ret_.params = graph_codegen_->GetParams();
+    const String executor_str =
+        target_host->GetAttr<String>("executor").value_or(kTvmExecutorGraph);

Review comment:
       I agree. I think there are a set of options we've been putting into Target which don't belong. `--runtime` is another great example. I think we need to create a separate config object e.g. `deploy_env` which could be a parameter to `tvm.relay.build()`. i'm not sure we need to address this in this PR, since i think it will be a fairly invasive change.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] giuseros commented on pull request #7785: [AOT] Introducing AOT in TVM

Posted by GitBox <gi...@apache.org>.

giuseros commented on pull request #7785:
URL: https://github.com/apache/tvm/pull/7785#issuecomment-825603629


   Hi @areusch , 
   I worked on your suggestions, this is what I did:
   * Moved meta_data.h in include/tvm/runtime. This is to use the runtime::kTvmExecutor{Graph,Aot} also from tests, and in this way we don't need to `include` relative paths in the files. 
   * Added the stack FIFO memory checks. We are "wasting" 4 bytes per block, so I enabled it only if `TVM_CRT_RUNTIME` is defined. 
   * In the Executor factory, I merged `get_internal_repr` and `save_executor_config` in a single `get_executor_config` 
   * Added tests for the content of sid sizes. 
   Only thing I wasn't able to do was to add tests for FIFO enforcement. I have a test working in my local repo, but the problem is that I am not able to define `TVM_CRT_RUNTIME` for the single `aot_memory_test.cc` (the best would be to define it for the single test). Any suggestions?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] areusch commented on a change in pull request #7785: [AOT] Introducing AOT in TVM

Posted by GitBox <gi...@apache.org>.

areusch commented on a change in pull request #7785:
URL: https://github.com/apache/tvm/pull/7785#discussion_r608909183



##########
File path: python/tvm/relay/build_module.py
##########
@@ -287,7 +289,8 @@ def build(ir_mod, target=None, target_host=None, params=None, mod_name="default"
 
     with tophub_context:
         bld_mod = BuildModule()
-        graph_json, runtime_mod, params = bld_mod.build(mod=ir_mod, target=target, params=params)
+
+        graph, runtime_mod, params = bld_mod.build(mod=ir_mod, target=target, params=params)
         executor_factory = _graph_executor_factory.GraphExecutorFactoryModule(

Review comment:
       that matches with my preference, see other thread




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] giuseros commented on a change in pull request #7785: [AOT] Introducing AOT in TVM

Posted by GitBox <gi...@apache.org>.

giuseros commented on a change in pull request #7785:
URL: https://github.com/apache/tvm/pull/7785#discussion_r610716277



##########
File path: include/tvm/runtime/module.h
##########
@@ -230,6 +230,8 @@ constexpr const char* tvm_module_main = "__tvm_main__";
 constexpr const char* tvm_param_prefix = "__tvm_param__";
 /*! \brief A PackedFunc that looks up linked parameters by storage_id. */
 constexpr const char* tvm_lookup_linked_param = "_lookup_linked_param";
+/*! \brief The main AOT executor function */
+constexpr const char* tvm_run_func_prefix = "tvm__run_func";

Review comment:
       PR is here: #7815




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] giuseros commented on a change in pull request #7785: [AOT] Introducing AOT in TVM

Posted by GitBox <gi...@apache.org>.

giuseros commented on a change in pull request #7785:
URL: https://github.com/apache/tvm/pull/7785#discussion_r617747883



##########
File path: src/relay/backend/build_module.cc
##########
@@ -473,23 +517,25 @@ class RelayBuildModule : public runtime::ModuleNode {
 
     // Relay IRModule -> IRModule optimizations.
     relay_module = Optimize(relay_module, targets_, params);
+
     // Get the updated function.
     auto func = Downcast<Function>(relay_module->Lookup("main"));
 
     // Generate code for the updated function.
-    graph_codegen_ = std::unique_ptr<GraphCodegen>(new GraphCodegen());
-    graph_codegen_->Init(nullptr, targets_);
-    graph_codegen_->Codegen(func);
-
-    ret_.graph_json = graph_codegen_->GetJSON();
-    ret_.params = graph_codegen_->GetParams();
+    const String executor_str =
+        target_host->GetAttr<String>("executor").value_or(kTvmExecutorGraph);

Review comment:
       So, this change was quite straightforward to make. I just pushed it: see how it looks




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] giuseros commented on pull request #7785: [AOT] Introducing AOT in TVM

Posted by GitBox <gi...@apache.org>.

giuseros commented on pull request #7785:
URL: https://github.com/apache/tvm/pull/7785#issuecomment-822853857


   Hi @manupa-arm , @areusch , 
   I applied most of the changes. Namely:
   * Moved to a `call_cpacked` builtin
   * Moved the `aot_executor.h` in internal
   * Moved to use `tir::Allocate`
   * Removed the `aot_backend_api.c` (and defined a fake `TVMFuncRegisterGlobal` in `test.c`)
   * Removed the runner function `PrimFunc` returned as internal representation by AOT
   I partially addressed the following:
   * About unification of `BuildOutput` , @manupa-arm we agreed that it is too complex to do now, because of the `get_graph_json` method defined in the `RelayBuildModule`. 
   * Instead of passing the `LoweredOutput` to the `CreateMetadataModule` I moved `AOTMetadata` to a generic `Metadata` and used it. The main reason is that otherwise I need to expose to the Object system the `LoweredOutput` class, which seems to be too much of an internal back-end class to be exposed.
   
   Please, let me know what you think!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] giuseros commented on a change in pull request #7785: [AOT] Introducing AOT in TVM

Posted by GitBox <gi...@apache.org>.

giuseros commented on a change in pull request #7785:
URL: https://github.com/apache/tvm/pull/7785#discussion_r615040646



##########
File path: src/runtime/crt/memory/stack_allocator.c
##########
@@ -0,0 +1,48 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+// LINT_C_FILE
+
+#include <tvm/runtime/crt/stack_allocator.h>
+
+void* StackMemoryManager_Allocate(tvm_workspace_t* tvm_runtime_workspace, int32_t nbytes) {
+  uint32_t offset_bytes = (~nbytes + 1) & (TVM_RUNTIME_ALLOC_ALIGNMENT_BYTES - 1);

Review comment:
       Sorry, not sure I get exactly what you are proposing :) 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] areusch commented on a change in pull request #7785: [AOT] Introducing AOT in TVM

Posted by GitBox <gi...@apache.org>.

areusch commented on a change in pull request #7785:
URL: https://github.com/apache/tvm/pull/7785#discussion_r608902101



##########
File path: python/tvm/micro/model_library_format.py
##########
@@ -86,10 +86,13 @@ def _build_memory_map(graph_json):
     list :
         A list with one entry per storage id describing that memory.
     """
-    graph = json.loads(graph_json)
+    memory_map = []
+    if graph_str.startswith("primfn"):

Review comment:
       Okay yeah I understand what you're doing here, but I agree with @manupa-arm that we should not overload graph_json. `tvm.relay.build` is currently specific to building artifacts to be launched by GraphExecutor. I don't think it should remain specific to that, but that's our API surface now.
   
   Currently, although you can unpack `graph_json, lib, params`, this is actually deprecated behavior. The documented return value is a `Module` implementing the `GraphExecutorFactoryModule` interface, which is wrapped by a Python-side `GraphExecutorFactoryModule` class. I think we should create a corresponding `AotExecutorFactoryModule` and update `tvm.micro.export_model_library_format` to recognize that and consume appropriately. Though this is not proper duck-typing, I think we eventually intend to promote `export_library_model_format` to a member function of `*ExecutorFactoryModule` once it's completely solidified and gains traction outside of the C runtime (e.g. is supported in the c++ cases for `apps/bundle_deploy` and perhaps with GPUs).
   
   With regards to the memory map: I'd like to separate the tensor pinning in this PR from the AOT change and instead (in the short term) continue to rely on TVMBackendAllocWorkspace for AOT-allocated tensors. I agree we need to do something _like_ tensor pinning, but I prefer if we can resolve that in a separate RFC/PR inside GraphPlanMemory. Such an RFC should also broaden GraphPlanMemory and introduce concepts e.g. `storage_id` as a user-facing identifier so that developers understand how to communicate tensor placement to the TVM runtime.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] manupa-arm commented on a change in pull request #7785: [AOT] Introducing AOT in TVM

Posted by GitBox <gi...@apache.org>.

manupa-arm commented on a change in pull request #7785:
URL: https://github.com/apache/tvm/pull/7785#discussion_r617672408



##########
File path: src/relay/backend/build_module.cc
##########
@@ -473,23 +517,25 @@ class RelayBuildModule : public runtime::ModuleNode {
 
     // Relay IRModule -> IRModule optimizations.
     relay_module = Optimize(relay_module, targets_, params);
+
     // Get the updated function.
     auto func = Downcast<Function>(relay_module->Lookup("main"));
 
     // Generate code for the updated function.
-    graph_codegen_ = std::unique_ptr<GraphCodegen>(new GraphCodegen());
-    graph_codegen_->Init(nullptr, targets_);
-    graph_codegen_->Codegen(func);
-
-    ret_.graph_json = graph_codegen_->GetJSON();
-    ret_.params = graph_codegen_->GetParams();
+    const String executor_str =
+        target_host->GetAttr<String>("executor").value_or(kTvmExecutorGraph);

Review comment:
       Alright! copying @mbaret . 
   @giuseros did you have to workaround something because of this ?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] areusch commented on a change in pull request #7785: [AOT] Introducing AOT in TVM

Posted by GitBox <gi...@apache.org>.

areusch commented on a change in pull request #7785:
URL: https://github.com/apache/tvm/pull/7785#discussion_r614972206



##########
File path: include/tvm/runtime/crt/page_allocator.h
##########
@@ -18,12 +18,12 @@
  */
 
 /*!
- * \file tvm/runtime/crt/memory.h
+ * \file tvm/runtime/crt/page_allocator.h

Review comment:
       want to also rename MemoryManagerCreate to e.g. PageMemoryManagerCreate, since we now have StackMemoryManager?

##########
File path: python/tvm/relay/backend/executor_factory.py
##########
@@ -14,21 +14,125 @@
 # KIND, either express or implied.  See the License for the
 # specific language governing permissions and limitations
 # under the License.
-"""Graph executor factory."""
+"""Executor factory modules."""
+from abc import abstractmethod
 import warnings
+
+from tvm import tir
+
 from ..._ffi.base import string_types
 from ..._ffi.registry import get_global_func
 from ...runtime import ndarray
 
 
-class GraphExecutorFactoryModule:
+class ExecutorFactoryModule:
+    """Common interface for executor factory modules
+    This class describes the common API of different
+    factory modules
+    """
+
+    @abstractmethod
+    def get_internal_repr(self):
+        """Common function to return the internal representation
+        the executor relies upon to execute the network
+        """
+        raise NotImplementedError
+
+    @abstractmethod
+    def get_params(self):
+        """
+        Sometimes we want to get params explicitly.
+        For example, we want to save its params value to
+        an independent file.
+        """
+        raise NotImplementedError
+
+    @abstractmethod
+    def get_lib(self):
+        """ Return the generated library"""
+        raise NotImplementedError
+
+    @abstractmethod
+    def get_internal_repr(self):
+        """ Return the internal representation used to execute the network"""
+        raise NotImplementedError
+
+    def __getitem__(self, item):
+        print(item)
+        return self.module.__getitem__(item)
+
+    def __iter__(self):
+        warnings.warn(
+            "legacy graph executor behavior of producing json / lib / params will be "
+            "removed in the next release."
+            " Please see documents of tvm.contrib.graph_executor.GraphModule for the "
+            " new recommended usage.",
+            DeprecationWarning,
+            2,
+        )
+        return self
+
+    def __next__(self):
+        if self.iter_cnt > 2:
+            raise StopIteration
+
+        objs = [self.get_internal_repr(), self.lib, self.params]
+        obj = objs[self.iter_cnt]
+        self.iter_cnt += 1
+        return obj
+
+
+class AOTExecutorFactoryModule(ExecutorFactoryModule):
+    """AOT executor factory module.

Review comment:
       I think this belongs on `__init__`, or change Parameters to Attributes. https://numpydoc.readthedocs.io/en/latest/format.html#class-docstring

##########
File path: python/tvm/relay/backend/executor_factory.py
##########
@@ -14,21 +14,125 @@
 # KIND, either express or implied.  See the License for the
 # specific language governing permissions and limitations
 # under the License.
-"""Graph executor factory."""
+"""Executor factory modules."""
+from abc import abstractmethod
 import warnings
+
+from tvm import tir
+
 from ..._ffi.base import string_types
 from ..._ffi.registry import get_global_func
 from ...runtime import ndarray
 
 
-class GraphExecutorFactoryModule:
+class ExecutorFactoryModule:
+    """Common interface for executor factory modules
+    This class describes the common API of different
+    factory modules
+    """
+
+    @abstractmethod
+    def get_internal_repr(self):
+        """Common function to return the internal representation

Review comment:
       @giuseros can you follow the numpydoc style: https://numpydoc.readthedocs.io/en/latest/format.html#sections

##########
File path: src/target/source/codegen_c_host.cc
##########
@@ -211,21 +214,34 @@ void CodeGenCHost::PrintGetFuncFromBackend(const std::string& func_name,
   this->stream << "}\n";
 }
 
-void CodeGenCHost::PrintFuncCall(const std::string& packed_func_name, int num_args) {
+void CodeGenCHost::PrintFuncCall(const std::string& packed_func_name, PrimExpr values,
+                                 int num_args) {
   this->PrintIndent();
+  std::string stack_value = "stack_value";

Review comment:
       does it make sense to add something explicitly indicating this is a default e.g. unnamed_stack_value

##########
File path: src/relay/backend/aot_codegen.cc
##########
@@ -0,0 +1,704 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+/*!
+ * \file relay/backend/graph_codegen.cc
+ * \brief Graph runtime codegen
+ */
+
+#include <dmlc/any.h>
+#include <tvm/ir/module.h>
+#include <tvm/relay/expr_functor.h>
+#include <tvm/runtime/device_api.h>
+#include <tvm/tir/builtin.h>
+#include <tvm/tir/expr.h>
+#include <tvm/tir/stmt.h>
+
+#include <algorithm>
+#include <list>
+#include <string>
+#include <vector>
+
+#include "../../runtime/meta_data.h"
+#include "compile_engine.h"
+#include "utils.h"
+
+namespace tvm {
+namespace relay {
+namespace backend {
+
+using IntegerArray = Array<Integer>;
+using ShapeVector = std::vector<std::vector<int64_t>>;
+using GraphAttrs = std::unordered_map<std::string, dmlc::any>;
+using TargetsMap = std::unordered_map<int, Target>;
+
+/*! \brief Lowered outputs */
+struct AOTLoweredOutput {
+  tir::PrimFunc runner_func;
+  Map<String, IRModule> lowered_funcs;
+  Array<tvm::runtime::Module> external_mods;
+  std::unordered_map<std::string, std::pair<int, const tvm::runtime::NDArray>> params;
+  runtime::AOTMetadata aot_metadata;
+};
+
+class AotReturnSidVisitor : public ExprVisitor {
+ public:
+  explicit AotReturnSidVisitor(Map<Expr, Array<IntegerArray>> storage_device_map)
+      : storage_device_map_{storage_device_map}, return_sid_{-1} {}
+
+  IntegerArray FindReturnSid(Function func) {
+    VisitExpr(func->body);
+    return return_sid_;
+  }
+
+ protected:
+  void AssignReturnSid(Expr e) {
+    auto iter = storage_device_map_.find(e);
+    if (iter != storage_device_map_.end()) {
+      return_sid_ = (*iter).second[0];
+    }
+  }
+
+  void VisitExpr_(const ConstantNode* cn) override {
+    ExprVisitor::VisitExpr_(cn);
+    AssignReturnSid(GetRef<Expr>(cn));
+  }
+
+  void VisitExpr_(const VarNode* vn) override {
+    ExprVisitor::VisitExpr_(vn);
+    AssignReturnSid(GetRef<Expr>(vn));
+  }
+
+  void VisitExpr_(const CallNode* cn) override {
+    ExprVisitor::VisitExpr_(cn);
+    AssignReturnSid(GetRef<Expr>(cn));
+  }
+
+  void VisitExpr_(const LetNode* op) override { VisitExpr(op->body); }
+
+  void VisitExpr_(const TupleNode* tn) override {
+    ExprVisitor::VisitExpr_(tn);
+    AssignReturnSid(GetRef<Expr>(tn));
+  }
+
+ private:
+  Map<Expr, Array<IntegerArray>> storage_device_map_;
+  IntegerArray return_sid_;
+};
+
+/*! \brief Code generator for AOT executor */
+class AOTCodegen : public ExprVisitor {
+ protected:
+  /*!
+   * \brief Utility function to allocate a DLTensor or TVMValue
+   * \param  type the type of allocation
+   * \param num the number of variable to allocate on the stack
+   * \return PrimExpr representing the allocated object
+   */
+  PrimExpr StackAlloca(std::string type, size_t num) {
+    Array<PrimExpr> args = {tir::StringImm(type), ConstInt32(num)};
+    return tir::Call(DataType::Handle(), tir::builtin::tvm_stack_alloca(), args);
+  }
+
+  /*!
+   * \brief Utility function to allocate memory for storage identifiers
+   * \param  memory_size_byte size in bytes of the allocation
+   * \return PrimExpr representing the allocated memory
+   */
+  PrimExpr AllocateBackendMemory(int memory_size_byte) {
+    // TODO(giuseros): use tir::Allocate instead of TVMBackendAllocWorkspace
+    // to enable unified memory planning
+    static const Op& op = Op::Get("tir.TVMBackendAllocWorkspace");
+    return tvm::tir::Call(DataType::Handle(), op, {1, 0, memory_size_byte, 2, 8});
+  }
+
+  /*!
+   * \brief Utility function to convert a concrete integer to a PrimExpr.
+   * \param num the number to convert
+   * \return PrimExpr representing num
+   */
+  inline PrimExpr ConstInt32(size_t num) {
+    ICHECK_LE(num, std::numeric_limits<int>::max());
+    return tir::make_const(DataType::Int(32), static_cast<int>(num));
+  }
+
+  /*!
+   * \brief Return a vector of variables that represents the sids for the given Relay Expr
+   */
+  std::vector<tir::Var> pack_sid(Expr expr) {
+    Array<IntegerArray> sids = storage_device_map_[expr];
+    std::vector<tir::Var> sid_vars;
+
+    // Note that an expression can have multiple sids associated with it
+    // e.g., returning multiple values from a function
+    for (const auto& sid : sids[0]) {
+      // Determine if an sid is an output buffer
+      int sid_int = static_cast<int>((sid.as<IntImmNode>())->value);
+      auto output_iter = std::find(return_sid_.begin(), return_sid_.end(), sid_int);
+      if (output_iter != return_sid_.end()) {
+        int output_index = std::distance(return_sid_.begin(), output_iter);
+        sid_vars.push_back(main_signature_[input_vars_.size() + output_index]);
+        continue;
+      }
+      // Pack the sid inside the TVMValue
+      auto sid_array = te::Var(make_string("sid_", sid, "_value"), DataType::Handle());
+      auto sid_value = sids_table_[sid];
+      tvm::PrimExpr set_tensor =
+          tvm::tir::Call(DataType::Handle(), tvm::tir::builtin::tvm_struct_set(),
+                         {sid_array, 0, tir::builtin::kArrData, sid_value});
+      stmts_.push_back(tir::LetStmt(sid_array, StackAlloca("array", 1), tir::Evaluate(set_tensor)));
+      sid_vars.push_back(sid_array);
+    }
+    return sid_vars;
+  }
+
+  /*!
+   * \brief Utility function to return a parameter associated with an expression
+   * \param expr Relay Expression assicated with the parameter
+   * \return Variable that represents the DLTensor associated with the parameters
+   */
+  tir::Var pack_param(Expr expr) {
+    // TODO(giuseros): Using call_extern to call into lookup_linked_param. This is because the
+    // builtin::ret is not supported yet in the c target. Once return is supported we can use
+    // tvm_call_packed_lowered().
+    int param_sid = param_storage_ids_[params_by_expr_[expr]];
+    auto lookup_linked_param_fn = tir::StringImm(::tvm::runtime::symbol::tvm_lookup_linked_param);
+    auto param_array = te::Var(make_string("param_", param_sid, "_array"), DataType::Handle());
+
+    // Compose the lookup_call using a local stack
+    Array<tir::Stmt> lookup_call;
+    auto param_var = te::Var(make_string("param_", param_sid, "_value"), DataType::Handle());
+    auto ret_var = te::Var("ret_value", DataType::Handle());
+    auto ret_code = te::Var("ret_value", DataType::Handle());
+
+    lookup_call.push_back(tir::Evaluate(
+        tvm::tir::Call(DataType::Handle(), tvm::tir::builtin::tvm_struct_set(),
+                       {param_var, 0, tir::builtin::kTVMValueContent, ConstInt32(param_sid)})));
+    lookup_call.push_back(tir::Evaluate(
+        tvm::tir::Call(DataType::Handle(), tir::builtin::call_extern(),
+                       {lookup_linked_param_fn, param_var, 0, 0, ret_var, ret_code, 0})));
+    auto ret_var_handle = tvm::tir::Call(DataType::Handle(), tvm::tir::builtin::tvm_struct_get(),
+                                         {ret_var, 0, tir::builtin::kTVMValueContent});
+
+    // Set the param to the value returned by lookup_call
+    tvm::PrimExpr set_param_array =
+        tvm::tir::Call(DataType::Handle(), tvm::tir::builtin::tvm_struct_set(),
+                       {param_array, 0, tir::builtin::kArrData, ret_var_handle});
+    lookup_call.push_back(tir::Evaluate(set_param_array));
+
+    tir::Stmt lookup_body = tir::SeqStmt(lookup_call);
+
+    // Allocate the DLTensors on the stack
+    lookup_body = tir::LetStmt(param_var, StackAlloca("arg_value", 1), lookup_body);
+    lookup_body = tir::LetStmt(ret_var, StackAlloca("arg_value", 1), lookup_body);
+    lookup_body = tir::LetStmt(ret_code, StackAlloca("arg_value", 1), lookup_body);
+    lookup_body = tir::LetStmt(param_array, StackAlloca("arg_value", 1), lookup_body);
+    stmts_.push_back(lookup_body);
+    return param_array;
+  }
+
+  /*!
+   * brief Given an expression return the variable(s) associated with that expression
+   */
+  std::vector<te::Var> find_expr(Expr arg) {
+    auto input_iter = std::find(input_vars_.begin(), input_vars_.end(), arg);
+    if (input_iter != input_vars_.end()) {
+      // Input variable
+      int main_index = std::distance(input_vars_.begin(), input_iter);
+      return {main_signature_[main_index]};
+    } else if (params_by_expr_.find(arg) != params_by_expr_.end()) {
+      // Parameter of the network
+      return {pack_param(arg)};
+    } else {
+      // Storage identifier (i.e., intermediate memory)
+      return pack_sid(arg);
+    }
+  }
+
+  /*!
+   * brief Call a function with a given name
+   */
+  void func_call(Call call, std::string func_name) {
+    tvm::Array<PrimExpr> args{tvm::tir::StringImm(func_name)};
+    std::vector<tir::Stmt> func_call_stmts;
+
+    // Pack the inputs
+    for (Expr arg : call->args) {
+      auto var_arg = find_expr(arg);
+      args.push_back(var_arg[0]);
+    }
+
+    auto ret_expr = Downcast<Expr>(call);
+
+    // Pack the return(s) value. A call node can produce multiple outputs
+    for (const auto& var : pack_sid(ret_expr)) {
+      args.push_back(var);
+    }
+
+    // Use tvm_call_packed to execute the function
+    func_call_stmts.push_back(tir::Evaluate(
+        tvm::tir::Call(DataType::Int(32), tvm::tir::builtin::tvm_call_packed(), args)));
+    tir::Stmt body = tir::SeqStmt(func_call_stmts);
+    stmts_.push_back(body);
+  }
+
+  /*!
+   * brief Copy a variable to the output. This function is mainly used in edge cases
+   * when we want to return an input or a parameter.
+   */
+  void copy_to_output(te::Var out, te::Var in, size_t size) {
+    auto retval_get = tvm::tir::Call(DataType::Handle(), tvm::tir::builtin::tvm_struct_get(),
+                                     {in, 0, tir::builtin::kArrData});
+
+    // Define intermediate DLTensor to load/store the data
+    auto tmp0 = te::Var("tmp0", DataType::Handle());
+    auto tmp1 = te::Var("tmp1", DataType::Handle());
+    te::Var loop_idx("i", DataType::Int(32));
+    auto retval_i = tir::Load(DataType::UInt(8), tmp0, loop_idx, tir::const_true());
+    auto tostore = tvm::tir::Call(DataType::Handle(), tvm::tir::builtin::tvm_struct_get(),
+                                  {out, 0, tir::builtin::kArrData});
+
+    // Copy the variable from the input to the output
+    tir::Stmt copy = tir::For(
+        loop_idx, 0, ConstInt32(size), tir::ForKind::kSerial,
+        tir::Store(tmp1, tir::Let(tmp0, retval_get, retval_i), loop_idx, tir::const_true()));
+    stmts_.push_back(tir::LetStmt(tmp1, tostore, copy));
+  }
+
+  /*!
+   * Utility function to string together different arguments
+   */
+  template <typename... Args>
+  std::string make_string(Args const&... args) {
+    std::ostringstream ss;
+    using List = int[];
+    (void)List{0, ((void)(ss << args), 0)...};
+
+    return ss.str();
+  }
+
+  void VisitExpr_(const CallNode* op) override {
+    // Descend the call tree
+    for (auto arg : op->args) {
+      VisitExpr(arg);
+    }
+
+    Expr expr = GetRef<Expr>(op);
+    Function func;
+    if (op->op.as<OpNode>()) {
+      LOG(FATAL) << "Operators should be transformed away; try applying"
+                 << "the fuse_ops transformation to the expression.";
+    } else if (op->op.as<GlobalVarNode>()) {
+      LOG(FATAL) << "Not implemented";
+    } else if (op->op.as<FunctionNode>()) {
+      func = GetRef<Function>(op->op.as<FunctionNode>());
+    } else {
+      LOG(FATAL) << "TVM runtime does not support calls to " << op->op->GetTypeKey();
+    }
+    if (!func->HasNonzeroAttr(attr::kPrimitive)) {
+      LOG(FATAL) << "TVM only support calls to primitive functions "
+                 << "(i.e functions composed of fusable operator invocations)";
+    }
+
+    auto pf0 = GetPackedFunc("relay.backend._make_CCacheKey");
+    auto pf1 = GetPackedFunc("relay.backend._CompileEngineLower");
+    Target target;
+    // Handle external function
+    if (func->GetAttr<String>(attr::kCompiler).defined()) {
+      target = Target("ext_dev");
+      CCacheKey key = (*pf0)(func, target);
+      CachedFunc ext_func = (*pf1)(compile_engine_, key);
+      ICHECK(ext_func.defined()) << "External function is not defined.";
+      UpdateConstants(func, &params_);
+
+      // Generate the TIR function call
+      func_call(GetRef<Call>(op), ext_func->func_name);
+    }
+
+    ICHECK_GE(storage_device_map_.count(expr), 0);
+    auto& device_type = storage_device_map_[expr][1];
+    auto call_dev_type = device_type[0]->value;
+    // Normal Relay Function
+    if (targets_.size() == 1) {
+      // homogeneous execution.
+      const auto& it = targets_.begin();
+      target = (*it).second;
+    } else {
+      // heterogeneous execution.
+      std::string call_dev_name;
+      if (call_dev_type == 0) {
+        call_dev_name = "llvm";
+      } else {
+        call_dev_name = runtime::DeviceName(call_dev_type);
+      }
+      if (targets_.count(call_dev_type) == 0) {
+        LOG(FATAL) << "No target is provided for device " << call_dev_name;
+      }
+      target = targets_[call_dev_type];
+    }
+    CCacheKey key = (*pf0)(func, target);
+    CachedFunc lowered_func = (*pf1)(compile_engine_, key);
+    if (!lowered_funcs_.count(target->str())) {
+      lowered_funcs_[target->str()] = IRModule(Map<GlobalVar, BaseFunc>({}));
+    }
+    lowered_funcs_[target->str()]->Update(lowered_func->funcs);
+
+    // Generate the TIR function call
+    func_call(GetRef<Call>(op), lowered_func->func_name);
+  }
+
+  void VisitExpr_(const VarNode* op) override {
+    Expr expr = GetRef<Expr>(op);
+
+    // If the Var node is an output node we need to copy the content of the variable to the output
+    // It's safe to check the SID here because Var StorageToken are never reallocated
+    Array<IntegerArray> sids = storage_device_map_[expr];
+
+    auto output_iter = std::find(return_sid_.begin(), return_sid_.end(),
+                                 static_cast<int>((sids[0][0].as<IntImmNode>())->value));
+    if (output_iter != return_sid_.end()) {
+      int output_index = std::distance(return_sid_.begin(), output_iter);
+      auto var_expr = find_expr(expr);
+      copy_to_output(main_signature_[input_vars_.size() + output_index], var_expr[0], sids[2][0]);
+    }
+  }
+
+  void VisitExpr_(const ConstantNode* op) override {
+    Expr expr = GetRef<Expr>(op);
+    size_t index = params_.size();
+    std::string name = "p" + std::to_string(index);
+
+    param_storage_ids_[name] = storage_device_map_[expr][0][0]->value;
+    params_[name] = op->data;
+    params_by_expr_.Set(expr, name);
+
+    // If the Constant node is an output node we need to copy the content of the parameter to the
+    // output A Var node can only produce a single output
+    Array<IntegerArray> sids = storage_device_map_[expr];
+    auto output_iter = std::find(return_sid_.begin(), return_sid_.end(),
+                                 static_cast<int>((sids[0][0].as<IntImmNode>())->value));
+    if (output_iter != return_sid_.end()) {
+      int output_index = std::distance(return_sid_.begin(), output_iter);
+      copy_to_output(main_signature_[input_vars_.size() + output_index], pack_param(expr),
+                     sids[2][0]);
+    }
+  }
+
+  void VisitExpr_(const TupleNode* op) override {
+    for (auto field : op->fields) {
+      VisitExpr(field);
+    }
+  }
+
+  void VisitExpr_(const LetNode* op) override {
+    // TODO(giuseros): support Let nodes in AOT
+    CHECK(false) << "Let not yet implemented in AOT";
+  }
+  void VisitExpr_(const TupleGetItemNode* op) override { VisitExpr(op->tuple); }
+  void VisitExpr_(const OpNode* op) override {
+    throw std::runtime_error("can not compile op in non-eta expanded form");
+  }
+  void VisitExpr_(const GlobalVarNode* op) override { throw std::runtime_error(""); }
+  void VisitExpr_(const IfNode* op) override { throw std::invalid_argument("if not supported"); }
+  void VisitExpr_(const FunctionNode* op) override {
+    ICHECK(op->GetAttr<String>(attr::kCompiler).defined())
+        << "Only functions supported by custom codegen";
+  }
+  void VisitExpr_(const RefCreateNode* op) override {
+    throw std::invalid_argument("reference not supported");
+  }
+  void VisitExpr_(const RefReadNode* op) override {
+    throw std::invalid_argument("reference not supported");
+  }
+  void VisitExpr_(const RefWriteNode* op) override {
+    throw std::invalid_argument("reference not supported");
+  }
+  void VisitExpr_(const ConstructorNode* op) override {
+    throw std::invalid_argument("ADT constructor case not yet implemented");
+  }
+  void VisitExpr_(const MatchNode* op) override {
+    throw std::invalid_argument("match case not yet implemented");
+  }
+
+  // Create the main PrimFunc to execute the graph
+  tir::PrimFunc CreateMainFunc(unsigned int relay_params) {
+    tir::Stmt body = tir::SeqStmt(stmts_);
+
+    // Allocate the sids
+    std::unordered_map<int, bool> allocated;
+
+    for (auto kv : storage_device_map_) {
+      // Only allocate sids that are needed
+      const bool is_input =
+          (std::find(input_vars_.begin(), input_vars_.end(), kv.first) != input_vars_.end());
+      const bool is_param = (params_by_expr_.find(kv.first) != params_by_expr_.end());
+      if (is_input || is_param) {
+        continue;
+      }
+
+      for (unsigned int i = 0; i < kv.second[0].size(); i++) {
+        int size = kv.second[2][i];
+        int sid = static_cast<int>((kv.second[0][i].as<IntImmNode>())->value);
+
+        if (std::find(return_sid_.begin(), return_sid_.end(), sid) != return_sid_.end()) {
+          continue;
+        }
+
+        // TODO(giuseros): we should allocate this one time outside the PrimFunc
+        // so we dont' pay the price of allocation for every inference
+        if (!allocated[sid]) {
+          body = tir::LetStmt(sids_table_[sid], AllocateBackendMemory(size), body);
+        }
+        allocated[sid] = true;
+      }
+    }
+
+    // Define the attributes
+    body = tir::AttrStmt(PrimExpr(), tvm::tir::attr::device_type, 1, body);
+    body = tir::AttrStmt(PrimExpr(), tvm::tir::attr::device_id, 0, body);
+
+    // Make the PrimFunc
+    return tir::PrimFunc(main_signature_, body, VoidType(), Map<tir::Var, tir::Buffer>(),
+                         DictAttrs(dict_attrs_));
+  }
+
+ protected:
+  /*! \brief mod */
+  runtime::Module* mod_;
+  /*! \brief list of input expressions (i.e., variable passed by the user) */
+  std::vector<Expr> input_vars_;
+  /*! \brief input and output variables belonging to the main function signature */
+  Array<tir::Var> main_signature_;
+  /*! \brief target device */
+  TargetsMap targets_;
+  /*! \brief target host */
+  Target target_host_;
+  /*! PrimFunc attributes */
+  Map<String, ObjectRef> dict_attrs_;
+
+  /*!
+   * \brief parameters (i.e. ConstantNodes found in the graph).
+   * These are take as inputs to the GraphRuntime.
+   * Maps param name to a pair of storage_id and NDArray. At runtime, the storage_id can be
+   * used to lookup the parameter.
+   */
+  std::unordered_map<std::string, runtime::NDArray> params_;
+  /*! \brief mapping between expression and parameters */
+  Map<Expr, String> params_by_expr_;
+  /*! \brief mapping between parameter names ("p0", "p1", etc..) and storage identifiers*/
+  std::unordered_map<std::string, int64_t> param_storage_ids_;
+
+  /*! \brief plan memory of device result */
+  Map<Expr, Array<IntegerArray>> storage_device_map_;
+  std::unordered_map<int, te::Var> sids_table_;
+  /*! \brief lowered funcs */
+  std::unordered_map<std::string, IRModule> lowered_funcs_;
+  /*! \brief name map */
+  std::unordered_map<std::string, size_t> name_map_;
+  /*! \brief compile engine */
+  CompileEngine compile_engine_;
+  /*! \brief GraphPlanMemory module */
+  runtime::Module graph_plan_memory_module_;
+  /*! \brief the IR module stored which represents the executor program */
+  Map<String, IRModule> tir_module_;
+  /*! \brief the set of statements that make the program */
+  std::vector<tir::Stmt> stmts_;
+  /*! \brief the list of return sids (note that the function might return more then one output */
+  IntegerArray return_sid_;
+
+ public:
+  AOTCodegen(runtime::Module* mod, const TargetsMap& targets, Target target_host)
+      : mod_(mod), return_sid_() {
+    compile_engine_ = CompileEngine::Global();
+    targets_ = targets;
+    target_host_ = target_host;
+    dict_attrs_.Set("global_symbol", runtime::String("tvm__run_func"));

Review comment:
       does this need to be class-level?

##########
File path: src/relay/backend/aot_codegen.cc
##########
@@ -0,0 +1,704 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+/*!
+ * \file relay/backend/graph_codegen.cc
+ * \brief Graph runtime codegen
+ */
+
+#include <dmlc/any.h>
+#include <tvm/ir/module.h>
+#include <tvm/relay/expr_functor.h>
+#include <tvm/runtime/device_api.h>
+#include <tvm/tir/builtin.h>
+#include <tvm/tir/expr.h>
+#include <tvm/tir/stmt.h>
+
+#include <algorithm>
+#include <list>
+#include <string>
+#include <vector>
+
+#include "../../runtime/meta_data.h"
+#include "compile_engine.h"
+#include "utils.h"
+
+namespace tvm {
+namespace relay {
+namespace backend {
+
+using IntegerArray = Array<Integer>;
+using ShapeVector = std::vector<std::vector<int64_t>>;
+using GraphAttrs = std::unordered_map<std::string, dmlc::any>;
+using TargetsMap = std::unordered_map<int, Target>;
+
+/*! \brief Lowered outputs */
+struct AOTLoweredOutput {
+  tir::PrimFunc runner_func;
+  Map<String, IRModule> lowered_funcs;
+  Array<tvm::runtime::Module> external_mods;
+  std::unordered_map<std::string, std::pair<int, const tvm::runtime::NDArray>> params;
+  runtime::AOTMetadata aot_metadata;
+};
+
+class AotReturnSidVisitor : public ExprVisitor {
+ public:
+  explicit AotReturnSidVisitor(Map<Expr, Array<IntegerArray>> storage_device_map)
+      : storage_device_map_{storage_device_map}, return_sid_{-1} {}
+
+  IntegerArray FindReturnSid(Function func) {
+    VisitExpr(func->body);
+    return return_sid_;
+  }
+
+ protected:
+  void AssignReturnSid(Expr e) {
+    auto iter = storage_device_map_.find(e);
+    if (iter != storage_device_map_.end()) {
+      return_sid_ = (*iter).second[0];
+    }
+  }
+
+  void VisitExpr_(const ConstantNode* cn) override {
+    ExprVisitor::VisitExpr_(cn);
+    AssignReturnSid(GetRef<Expr>(cn));
+  }
+
+  void VisitExpr_(const VarNode* vn) override {
+    ExprVisitor::VisitExpr_(vn);
+    AssignReturnSid(GetRef<Expr>(vn));
+  }
+
+  void VisitExpr_(const CallNode* cn) override {
+    ExprVisitor::VisitExpr_(cn);
+    AssignReturnSid(GetRef<Expr>(cn));
+  }
+
+  void VisitExpr_(const LetNode* op) override { VisitExpr(op->body); }
+
+  void VisitExpr_(const TupleNode* tn) override {
+    ExprVisitor::VisitExpr_(tn);
+    AssignReturnSid(GetRef<Expr>(tn));
+  }
+
+ private:
+  Map<Expr, Array<IntegerArray>> storage_device_map_;
+  IntegerArray return_sid_;
+};
+
+/*! \brief Code generator for AOT executor */
+class AOTCodegen : public ExprVisitor {
+ protected:
+  /*!
+   * \brief Utility function to allocate a DLTensor or TVMValue
+   * \param  type the type of allocation
+   * \param num the number of variable to allocate on the stack
+   * \return PrimExpr representing the allocated object
+   */
+  PrimExpr StackAlloca(std::string type, size_t num) {
+    Array<PrimExpr> args = {tir::StringImm(type), ConstInt32(num)};
+    return tir::Call(DataType::Handle(), tir::builtin::tvm_stack_alloca(), args);
+  }
+
+  /*!
+   * \brief Utility function to allocate memory for storage identifiers
+   * \param  memory_size_byte size in bytes of the allocation
+   * \return PrimExpr representing the allocated memory
+   */
+  PrimExpr AllocateBackendMemory(int memory_size_byte) {
+    // TODO(giuseros): use tir::Allocate instead of TVMBackendAllocWorkspace
+    // to enable unified memory planning
+    static const Op& op = Op::Get("tir.TVMBackendAllocWorkspace");
+    return tvm::tir::Call(DataType::Handle(), op, {1, 0, memory_size_byte, 2, 8});
+  }
+
+  /*!
+   * \brief Utility function to convert a concrete integer to a PrimExpr.
+   * \param num the number to convert
+   * \return PrimExpr representing num
+   */
+  inline PrimExpr ConstInt32(size_t num) {
+    ICHECK_LE(num, std::numeric_limits<int>::max());
+    return tir::make_const(DataType::Int(32), static_cast<int>(num));
+  }
+
+  /*!
+   * \brief Return a vector of variables that represents the sids for the given Relay Expr
+   */
+  std::vector<tir::Var> pack_sid(Expr expr) {
+    Array<IntegerArray> sids = storage_device_map_[expr];
+    std::vector<tir::Var> sid_vars;
+
+    // Note that an expression can have multiple sids associated with it
+    // e.g., returning multiple values from a function
+    for (const auto& sid : sids[0]) {
+      // Determine if an sid is an output buffer
+      int sid_int = static_cast<int>((sid.as<IntImmNode>())->value);
+      auto output_iter = std::find(return_sid_.begin(), return_sid_.end(), sid_int);
+      if (output_iter != return_sid_.end()) {
+        int output_index = std::distance(return_sid_.begin(), output_iter);
+        sid_vars.push_back(main_signature_[input_vars_.size() + output_index]);
+        continue;
+      }
+      // Pack the sid inside the TVMValue
+      auto sid_array = te::Var(make_string("sid_", sid, "_value"), DataType::Handle());
+      auto sid_value = sids_table_[sid];
+      tvm::PrimExpr set_tensor =
+          tvm::tir::Call(DataType::Handle(), tvm::tir::builtin::tvm_struct_set(),
+                         {sid_array, 0, tir::builtin::kArrData, sid_value});
+      stmts_.push_back(tir::LetStmt(sid_array, StackAlloca("array", 1), tir::Evaluate(set_tensor)));
+      sid_vars.push_back(sid_array);
+    }
+    return sid_vars;
+  }
+
+  /*!
+   * \brief Utility function to return a parameter associated with an expression
+   * \param expr Relay Expression assicated with the parameter
+   * \return Variable that represents the DLTensor associated with the parameters
+   */
+  tir::Var pack_param(Expr expr) {
+    // TODO(giuseros): Using call_extern to call into lookup_linked_param. This is because the
+    // builtin::ret is not supported yet in the c target. Once return is supported we can use
+    // tvm_call_packed_lowered().
+    int param_sid = param_storage_ids_[params_by_expr_[expr]];
+    auto lookup_linked_param_fn = tir::StringImm(::tvm::runtime::symbol::tvm_lookup_linked_param);
+    auto param_array = te::Var(make_string("param_", param_sid, "_array"), DataType::Handle());
+
+    // Compose the lookup_call using a local stack
+    Array<tir::Stmt> lookup_call;
+    auto param_var = te::Var(make_string("param_", param_sid, "_value"), DataType::Handle());
+    auto ret_var = te::Var("ret_value", DataType::Handle());
+    auto ret_code = te::Var("ret_value", DataType::Handle());
+
+    lookup_call.push_back(tir::Evaluate(
+        tvm::tir::Call(DataType::Handle(), tvm::tir::builtin::tvm_struct_set(),
+                       {param_var, 0, tir::builtin::kTVMValueContent, ConstInt32(param_sid)})));
+    lookup_call.push_back(tir::Evaluate(
+        tvm::tir::Call(DataType::Handle(), tir::builtin::call_extern(),
+                       {lookup_linked_param_fn, param_var, 0, 0, ret_var, ret_code, 0})));
+    auto ret_var_handle = tvm::tir::Call(DataType::Handle(), tvm::tir::builtin::tvm_struct_get(),
+                                         {ret_var, 0, tir::builtin::kTVMValueContent});
+
+    // Set the param to the value returned by lookup_call
+    tvm::PrimExpr set_param_array =
+        tvm::tir::Call(DataType::Handle(), tvm::tir::builtin::tvm_struct_set(),
+                       {param_array, 0, tir::builtin::kArrData, ret_var_handle});
+    lookup_call.push_back(tir::Evaluate(set_param_array));
+
+    tir::Stmt lookup_body = tir::SeqStmt(lookup_call);
+
+    // Allocate the DLTensors on the stack
+    lookup_body = tir::LetStmt(param_var, StackAlloca("arg_value", 1), lookup_body);
+    lookup_body = tir::LetStmt(ret_var, StackAlloca("arg_value", 1), lookup_body);
+    lookup_body = tir::LetStmt(ret_code, StackAlloca("arg_value", 1), lookup_body);
+    lookup_body = tir::LetStmt(param_array, StackAlloca("arg_value", 1), lookup_body);
+    stmts_.push_back(lookup_body);
+    return param_array;
+  }
+
+  /*!
+   * brief Given an expression return the variable(s) associated with that expression
+   */
+  std::vector<te::Var> find_expr(Expr arg) {
+    auto input_iter = std::find(input_vars_.begin(), input_vars_.end(), arg);
+    if (input_iter != input_vars_.end()) {
+      // Input variable
+      int main_index = std::distance(input_vars_.begin(), input_iter);
+      return {main_signature_[main_index]};
+    } else if (params_by_expr_.find(arg) != params_by_expr_.end()) {
+      // Parameter of the network
+      return {pack_param(arg)};
+    } else {
+      // Storage identifier (i.e., intermediate memory)
+      return pack_sid(arg);
+    }
+  }
+
+  /*!
+   * brief Call a function with a given name
+   */
+  void func_call(Call call, std::string func_name) {
+    tvm::Array<PrimExpr> args{tvm::tir::StringImm(func_name)};
+    std::vector<tir::Stmt> func_call_stmts;
+
+    // Pack the inputs
+    for (Expr arg : call->args) {
+      auto var_arg = find_expr(arg);
+      args.push_back(var_arg[0]);
+    }
+
+    auto ret_expr = Downcast<Expr>(call);
+
+    // Pack the return(s) value. A call node can produce multiple outputs
+    for (const auto& var : pack_sid(ret_expr)) {
+      args.push_back(var);
+    }
+
+    // Use tvm_call_packed to execute the function
+    func_call_stmts.push_back(tir::Evaluate(
+        tvm::tir::Call(DataType::Int(32), tvm::tir::builtin::tvm_call_packed(), args)));
+    tir::Stmt body = tir::SeqStmt(func_call_stmts);
+    stmts_.push_back(body);
+  }
+
+  /*!
+   * brief Copy a variable to the output. This function is mainly used in edge cases
+   * when we want to return an input or a parameter.
+   */
+  void copy_to_output(te::Var out, te::Var in, size_t size) {
+    auto retval_get = tvm::tir::Call(DataType::Handle(), tvm::tir::builtin::tvm_struct_get(),
+                                     {in, 0, tir::builtin::kArrData});
+
+    // Define intermediate DLTensor to load/store the data
+    auto tmp0 = te::Var("tmp0", DataType::Handle());
+    auto tmp1 = te::Var("tmp1", DataType::Handle());
+    te::Var loop_idx("i", DataType::Int(32));
+    auto retval_i = tir::Load(DataType::UInt(8), tmp0, loop_idx, tir::const_true());
+    auto tostore = tvm::tir::Call(DataType::Handle(), tvm::tir::builtin::tvm_struct_get(),
+                                  {out, 0, tir::builtin::kArrData});
+
+    // Copy the variable from the input to the output
+    tir::Stmt copy = tir::For(
+        loop_idx, 0, ConstInt32(size), tir::ForKind::kSerial,
+        tir::Store(tmp1, tir::Let(tmp0, retval_get, retval_i), loop_idx, tir::const_true()));
+    stmts_.push_back(tir::LetStmt(tmp1, tostore, copy));
+  }
+
+  /*!
+   * Utility function to string together different arguments
+   */
+  template <typename... Args>
+  std::string make_string(Args const&... args) {
+    std::ostringstream ss;
+    using List = int[];
+    (void)List{0, ((void)(ss << args), 0)...};
+
+    return ss.str();
+  }
+
+  void VisitExpr_(const CallNode* op) override {
+    // Descend the call tree
+    for (auto arg : op->args) {
+      VisitExpr(arg);
+    }
+
+    Expr expr = GetRef<Expr>(op);
+    Function func;
+    if (op->op.as<OpNode>()) {
+      LOG(FATAL) << "Operators should be transformed away; try applying"
+                 << "the fuse_ops transformation to the expression.";
+    } else if (op->op.as<GlobalVarNode>()) {
+      LOG(FATAL) << "Not implemented";
+    } else if (op->op.as<FunctionNode>()) {
+      func = GetRef<Function>(op->op.as<FunctionNode>());
+    } else {
+      LOG(FATAL) << "TVM runtime does not support calls to " << op->op->GetTypeKey();
+    }
+    if (!func->HasNonzeroAttr(attr::kPrimitive)) {
+      LOG(FATAL) << "TVM only support calls to primitive functions "
+                 << "(i.e functions composed of fusable operator invocations)";
+    }
+
+    auto pf0 = GetPackedFunc("relay.backend._make_CCacheKey");
+    auto pf1 = GetPackedFunc("relay.backend._CompileEngineLower");
+    Target target;
+    // Handle external function
+    if (func->GetAttr<String>(attr::kCompiler).defined()) {
+      target = Target("ext_dev");
+      CCacheKey key = (*pf0)(func, target);
+      CachedFunc ext_func = (*pf1)(compile_engine_, key);
+      ICHECK(ext_func.defined()) << "External function is not defined.";
+      UpdateConstants(func, &params_);
+
+      // Generate the TIR function call
+      func_call(GetRef<Call>(op), ext_func->func_name);
+    }
+
+    ICHECK_GE(storage_device_map_.count(expr), 0);
+    auto& device_type = storage_device_map_[expr][1];
+    auto call_dev_type = device_type[0]->value;
+    // Normal Relay Function
+    if (targets_.size() == 1) {
+      // homogeneous execution.
+      const auto& it = targets_.begin();
+      target = (*it).second;
+    } else {
+      // heterogeneous execution.
+      std::string call_dev_name;
+      if (call_dev_type == 0) {
+        call_dev_name = "llvm";
+      } else {
+        call_dev_name = runtime::DeviceName(call_dev_type);
+      }
+      if (targets_.count(call_dev_type) == 0) {
+        LOG(FATAL) << "No target is provided for device " << call_dev_name;
+      }
+      target = targets_[call_dev_type];
+    }
+    CCacheKey key = (*pf0)(func, target);
+    CachedFunc lowered_func = (*pf1)(compile_engine_, key);
+    if (!lowered_funcs_.count(target->str())) {
+      lowered_funcs_[target->str()] = IRModule(Map<GlobalVar, BaseFunc>({}));
+    }
+    lowered_funcs_[target->str()]->Update(lowered_func->funcs);
+
+    // Generate the TIR function call
+    func_call(GetRef<Call>(op), lowered_func->func_name);
+  }
+
+  void VisitExpr_(const VarNode* op) override {
+    Expr expr = GetRef<Expr>(op);
+
+    // If the Var node is an output node we need to copy the content of the variable to the output
+    // It's safe to check the SID here because Var StorageToken are never reallocated
+    Array<IntegerArray> sids = storage_device_map_[expr];
+
+    auto output_iter = std::find(return_sid_.begin(), return_sid_.end(),
+                                 static_cast<int>((sids[0][0].as<IntImmNode>())->value));
+    if (output_iter != return_sid_.end()) {
+      int output_index = std::distance(return_sid_.begin(), output_iter);
+      auto var_expr = find_expr(expr);
+      copy_to_output(main_signature_[input_vars_.size() + output_index], var_expr[0], sids[2][0]);
+    }
+  }
+
+  void VisitExpr_(const ConstantNode* op) override {
+    Expr expr = GetRef<Expr>(op);
+    size_t index = params_.size();
+    std::string name = "p" + std::to_string(index);
+
+    param_storage_ids_[name] = storage_device_map_[expr][0][0]->value;
+    params_[name] = op->data;
+    params_by_expr_.Set(expr, name);
+
+    // If the Constant node is an output node we need to copy the content of the parameter to the
+    // output A Var node can only produce a single output
+    Array<IntegerArray> sids = storage_device_map_[expr];
+    auto output_iter = std::find(return_sid_.begin(), return_sid_.end(),
+                                 static_cast<int>((sids[0][0].as<IntImmNode>())->value));
+    if (output_iter != return_sid_.end()) {
+      int output_index = std::distance(return_sid_.begin(), output_iter);
+      copy_to_output(main_signature_[input_vars_.size() + output_index], pack_param(expr),
+                     sids[2][0]);
+    }
+  }
+
+  void VisitExpr_(const TupleNode* op) override {
+    for (auto field : op->fields) {
+      VisitExpr(field);
+    }
+  }
+
+  void VisitExpr_(const LetNode* op) override {
+    // TODO(giuseros): support Let nodes in AOT
+    CHECK(false) << "Let not yet implemented in AOT";
+  }
+  void VisitExpr_(const TupleGetItemNode* op) override { VisitExpr(op->tuple); }
+  void VisitExpr_(const OpNode* op) override {
+    throw std::runtime_error("can not compile op in non-eta expanded form");
+  }
+  void VisitExpr_(const GlobalVarNode* op) override { throw std::runtime_error(""); }
+  void VisitExpr_(const IfNode* op) override { throw std::invalid_argument("if not supported"); }
+  void VisitExpr_(const FunctionNode* op) override {
+    ICHECK(op->GetAttr<String>(attr::kCompiler).defined())
+        << "Only functions supported by custom codegen";

Review comment:
       nit: "FunctionNode only" or just state not supported by AOT

##########
File path: src/target/source/source_module.cc
##########
@@ -191,17 +192,36 @@ class CSourceCrtMetadataModuleNode : public runtime::ModuleNode {
           << "}\n";
   }
 
+  void GenerateAOTDescriptor() {
+    code_ << "#include \"aot_executor.h\"\n";
+    code_ << "#include \"tvm/runtime/c_runtime_api.h\"\n";
+    code_ << "#ifdef __cplusplus\n";
+    code_ << "extern \"C\"\n";
+    code_ << "#endif\n";
+    code_ << "TVM_DLL int32_t " << ::tvm::runtime::symbol::tvm_run_func_prefix;
+    code_ << "(void* args, void* type_code, int num_args, void* out_value, void* "
+             "out_type_code, void* resource_handle);\n";
+    code_ << "const tvm_model_t network = {\n"
+          << "    .run_func = &" << ::tvm::runtime::symbol::tvm_run_func_prefix << ",\n"
+          << "    .num_input_tensors = " << aot_metadata_->num_inputs << ",\n"
+          << "    .num_output_tensors = " << aot_metadata_->num_outputs << ", \n"
+          << "};\n";
+  }
+
   void CreateSource() {
     if (target_->GetAttr<Bool>("system-lib").value_or(Bool(false)) && !func_names_.empty()) {
       CreateFuncRegistry();
       GenerateCrtSystemLib();
     }
+    if (target_->GetAttr<String>("executor").value_or("graph_runtime") == "aot") {

Review comment:
       should just be "graph" or "aot". could you use constants here?

##########
File path: tests/python/relay/aot/infra.py
##########
@@ -0,0 +1,226 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+"""
+This module provides infrastructure to verify the correctness of
+the command stream produced.
+Currently it will invoke vela to generate a vela-optimized tflite
+in which the command stream is contained as a custom operator.
+This class include methods to parse the custom operator to extract
+the command stream and perform an equivalency check for single operator
+test cases.
+"""
+import tflite
+import os
+import io
+import struct
+import numpy as np
+import pathlib
+import shutil
+import subprocess
+import tempfile
+import tarfile
+
+
+import tvm
+from tvm import relay
+from tvm.relay import transform
+from tvm.relay.op.contrib import get_pattern_table
+from tvm.contrib import utils, graph_executor
+from tvm.relay.backend import compile_engine
+from tvm.contrib import utils
+from tvm.contrib import graph_runtime
+from tvm.micro import export_model_library_format
+
+
+def subprocess_with_stdout_and_log(cmd, cwd, logfile, stdout):
+    """
+    This method runs a process and logs the output to both a log file and stdout
+    """
+    with subprocess.Popen(
+        cmd, cwd=cwd, shell=True, bufsize=0, stdout=subprocess.PIPE, stderr=subprocess.STDOUT
+    ) as proc, open(logfile, "a") as f:
+        while True:
+            data = proc.stdout.readline()
+            result = proc.poll()
+            # process is done if there is no data and the result is valid
+            if data == b"" and result is not None:
+                return int(result)
+            if data:
+                text = data.decode("ascii", errors="backslashreplace")
+                f.write(text)
+                if stdout:
+                    print(text, end="")
+
+
+def create_main(test_name, input_list, output_list, output_path):
+    file_path = pathlib.Path(f"{output_path}/" + test_name).resolve()
+    # create header file
+    raw_path = file_path.with_suffix(".c").resolve()
+    with open(raw_path, "w") as main_file:
+        main_file.write("#include <stdio.h>\n")
+        main_file.write('#include "aot_executor.h"\n')
+        main_file.write('#include "stack_allocator.h"\n')
+        main_file.write("#define WORKSPACE_SIZE (16384*1024)\n")
+        main_file.write("static uint8_t g_aot_memory[WORKSPACE_SIZE];\n")
+
+        for i in range(0, len(input_list)):
+            main_file.write('#include "input_data%i.h"\n' % i)
+        for i in range(0, len(output_list)):
+            main_file.write('#include "expected_output_data%i.h"\n' % i)
+            main_file.write('#include "output_data%i.h"\n' % i)
+
+        main_file.write("extern tvm_model_t network;\n")
+        main_file.write("tvm_workspace_t app_workspace;\n")
+        main_file.write(
+            """
+tvm_crt_error_t TVMPlatformMemoryAllocate(size_t num_bytes, DLDevice dev, void** out_ptr) {
+    (*out_ptr) = StackMemoryManager_Allocate(&app_workspace, num_bytes);
+}
+
+tvm_crt_error_t TVMPlatformMemoryFree(void* ptr, DLDevice dev) {
+    StackMemoryManager_Free(&app_workspace,ptr);
+}
+
+void  TVMPlatformAbort(tvm_crt_error_t code) { }
+
+void TVMLogf(const char* msg, ...) { }
+      
+        """
+        )
+        main_file.write("int main(){\n")
+        main_file.write("void* inputs[%i] = { " % (len(input_list)))
+
+        for i in range(0, len(input_list)):
+            main_file.write("input_data%i, " % i)
+        main_file.write("};\n")
+
+        main_file.write("void* outputs[%i]  = { " % (len(output_list)))
+        for i in range(0, len(output_list)):
+            main_file.write("output_data%i, " % i)
+        main_file.write("};\n")
+
+        main_file.write("StackMemoryManager_Init(&app_workspace, g_aot_memory, WORKSPACE_SIZE);")
+        main_file.write("tvm_runtime_run(&network, inputs, outputs);")
+
+        for i in range(0, len(output_list)):
+            main_file.write("for (int i = 0; i<output_data%i_len; i++){\n" % i)
+            main_file.write(
+                'if (output_data%s[i]!=expected_output_data%s[i]){printf("ko\\n");return -1;}\n'
+                % (i, i)
+            )
+            main_file.write("}\n")
+
+        main_file.write('printf("ok\\n");')
+        main_file.write("return 0;")
+        main_file.write("}\n")
+
+
+def create_header_file(tensor_name, npy_data, output_path):
+    """
+    This method generates a header file containing the data contained in the numpy array provided.
+    It is used to capture the tensor data (for both inputs and expected outputs) to be bundled into the standalone ethosu_test_runner.
+    """
+    file_path = pathlib.Path(f"{output_path}/" + tensor_name).resolve()
+    # create header file
+    raw_path = file_path.with_suffix(".h").resolve()
+    with open(raw_path, "w") as header_file:
+        header_file.write("#include <stddef.h>\n")
+        header_file.write("#include <stdint.h>\n")
+        header_file.write("#include <dlpack/dlpack.h>\n")
+        header_file.write(f"const size_t {tensor_name}_len = {npy_data.size};\n")
+
+        if npy_data.dtype == "int8":
+            header_file.write(f"int8_t {tensor_name}[] =")
+        elif npy_data.dtype == "int32":
+            header_file.write(f"int32_t {tensor_name}[] = ")
+        elif npy_data.dtype == "uint8":
+            header_file.write(f"uint8_t {tensor_name}[] = ")
+        elif npy_data.dtype == "float32":
+            header_file.write(f"float {tensor_name}[] = ")
+
+        header_file.write("{")
+        for i in np.ndindex(npy_data.shape):
+            header_file.write(f"{npy_data[i]}, ")
+        header_file.write("};\n\n")
+
+
+def verify_source(mod, input_list, output_list, params=None):

Review comment:
       I think you could call this something like compile_and_run, verify_source sounds like it's just going to assert on the source code content

##########
File path: python/tvm/relay/backend/executor_factory.py
##########
@@ -14,21 +14,125 @@
 # KIND, either express or implied.  See the License for the
 # specific language governing permissions and limitations
 # under the License.
-"""Graph executor factory."""
+"""Executor factory modules."""
+from abc import abstractmethod
 import warnings
+
+from tvm import tir
+
 from ..._ffi.base import string_types
 from ..._ffi.registry import get_global_func
 from ...runtime import ndarray
 
 
-class GraphExecutorFactoryModule:
+class ExecutorFactoryModule:
+    """Common interface for executor factory modules
+    This class describes the common API of different
+    factory modules
+    """
+
+    @abstractmethod
+    def get_internal_repr(self):
+        """Common function to return the internal representation
+        the executor relies upon to execute the network
+        """
+        raise NotImplementedError
+
+    @abstractmethod
+    def get_params(self):
+        """
+        Sometimes we want to get params explicitly.
+        For example, we want to save its params value to
+        an independent file.
+        """
+        raise NotImplementedError
+
+    @abstractmethod
+    def get_lib(self):
+        """ Return the generated library"""
+        raise NotImplementedError
+
+    @abstractmethod
+    def get_internal_repr(self):
+        """ Return the internal representation used to execute the network"""
+        raise NotImplementedError
+
+    def __getitem__(self, item):
+        print(item)
+        return self.module.__getitem__(item)
+
+    def __iter__(self):
+        warnings.warn(
+            "legacy graph executor behavior of producing json / lib / params will be "
+            "removed in the next release."
+            " Please see documents of tvm.contrib.graph_executor.GraphModule for the "
+            " new recommended usage.",
+            DeprecationWarning,
+            2,
+        )
+        return self
+
+    def __next__(self):
+        if self.iter_cnt > 2:
+            raise StopIteration
+
+        objs = [self.get_internal_repr(), self.lib, self.params]
+        obj = objs[self.iter_cnt]
+        self.iter_cnt += 1
+        return obj
+
+
+class AOTExecutorFactoryModule(ExecutorFactoryModule):
+    """AOT executor factory module.
+
+    Parameters
+    ----------
+    runner_function : the PrimFunc containing of the TIR main executor function.
+    target : tvm.Target
+        The Target used to build this module.
+    libmod : tvm.Module
+        The module of the corresponding function
+    libmod_name: str
+        The name of module
+    params : dict of str to NDArray
+        The parameters of module
+    """
+
+    def __init__(self, ir_mod, target, runner_function, libmod, libmod_name, params):
+        assert isinstance(runner_function, tir.PrimFunc)
+        args = []

Review comment:
       this seems unused here

##########
File path: src/target/source/codegen_c_host.cc
##########
@@ -40,13 +40,16 @@ namespace codegen {
 
 CodeGenCHost::CodeGenCHost() { module_name_ = GetUniqueName("__tvm_module_ctx"); }
 
-void CodeGenCHost::Init(bool output_ssa, bool emit_asserts, std::string target_str) {
+void CodeGenCHost::Init(bool output_ssa, bool emit_asserts, bool is_aot_executor,
+                        std::string target_str) {

Review comment:
       ok @giuseros, I agree this PR is pretty close to what I was proposing as a first cut. There are two areas I'd disagree:
   1. the `tir.call_cpacked` intrinsic--I think we are in agreement here, so let's see if this works out. if not, perhaps we can rely on the FuncRegistry and introduce a workaround in a follow-on PR.
   2. I think given we are deferring discussing the firmware-facing API, we should not checkin `include/tvm/runtime/crt/aot_executor.h` and instead place this in `src/runtime/crt/include/tvm/runtime/crt/common/`. This way, you can use it from tests without us declaring it as a public API just yet. There might be one tricky thing here--you might need to tweak your aot_test.mk to place those files in the public include path.
   
   how does this sound as far as a scope for this PR?

##########
File path: tests/python/relay/aot/aot_test.mk
##########
@@ -0,0 +1,71 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+# Makefile to build ethosu_test_runner
+# Setup build environment
+#
+AOT_ROOT ?= $(TVM_ROOT)/src/runtime/crt/aot

Review comment:
       @giuseros just curious if you have thoughts on this?

##########
File path: python/tvm/relay/build_module.py
##########
@@ -111,7 +113,7 @@ def build(self, mod, target=None, target_host=None, params=None):
 
         Returns
         -------
-        factory_module : tvm.relay.backend.graph_executor_factory.GraphExecutorFactoryModule
+        factory_module : tvm.relay.backend.executor_factory.ExecutorFactoryModule

Review comment:
       could you update the docstring to reflect the tuple return value?

##########
File path: python/tvm/relay/backend/executor_factory.py
##########
@@ -14,21 +14,125 @@
 # KIND, either express or implied.  See the License for the
 # specific language governing permissions and limitations
 # under the License.
-"""Graph executor factory."""
+"""Executor factory modules."""
+from abc import abstractmethod
 import warnings
+
+from tvm import tir
+
 from ..._ffi.base import string_types
 from ..._ffi.registry import get_global_func
 from ...runtime import ndarray
 
 
-class GraphExecutorFactoryModule:
+class ExecutorFactoryModule:
+    """Common interface for executor factory modules
+    This class describes the common API of different
+    factory modules
+    """
+
+    @abstractmethod
+    def get_internal_repr(self):
+        """Common function to return the internal representation
+        the executor relies upon to execute the network
+        """
+        raise NotImplementedError
+
+    @abstractmethod
+    def get_params(self):
+        """
+        Sometimes we want to get params explicitly.
+        For example, we want to save its params value to
+        an independent file.
+        """
+        raise NotImplementedError
+
+    @abstractmethod
+    def get_lib(self):
+        """ Return the generated library"""
+        raise NotImplementedError
+
+    @abstractmethod
+    def get_internal_repr(self):
+        """ Return the internal representation used to execute the network"""
+        raise NotImplementedError
+
+    def __getitem__(self, item):
+        print(item)
+        return self.module.__getitem__(item)
+
+    def __iter__(self):
+        warnings.warn(
+            "legacy graph executor behavior of producing json / lib / params will be "
+            "removed in the next release."
+            " Please see documents of tvm.contrib.graph_executor.GraphModule for the "
+            " new recommended usage.",
+            DeprecationWarning,
+            2,
+        )
+        return self
+
+    def __next__(self):
+        if self.iter_cnt > 2:
+            raise StopIteration
+
+        objs = [self.get_internal_repr(), self.lib, self.params]
+        obj = objs[self.iter_cnt]
+        self.iter_cnt += 1
+        return obj
+
+
+class AOTExecutorFactoryModule(ExecutorFactoryModule):
+    """AOT executor factory module.
+
+    Parameters
+    ----------
+    runner_function : the PrimFunc containing of the TIR main executor function.
+    target : tvm.Target
+        The Target used to build this module.
+    libmod : tvm.Module
+        The module of the corresponding function
+    libmod_name: str
+        The name of module
+    params : dict of str to NDArray
+        The parameters of module
+    """
+
+    def __init__(self, ir_mod, target, runner_function, libmod, libmod_name, params):
+        assert isinstance(runner_function, tir.PrimFunc)
+        args = []
+        for k, v in params.items():
+            args.append(k)
+            args.append(ndarray.array(v))
+
+        self.ir_mod = ir_mod
+        self.target = target
+        self.runner_func = runner_function
+        self.lib = libmod
+        self.libmod_name = libmod_name
+        self.params = params
+        self.iter_cnt = 0
+
+    # Sometimes we want to get params explicitly.

Review comment:
       make this a docstring or rm

##########
File path: python/tvm/micro/model_library_format.py
##########
@@ -126,20 +125,25 @@ def export_model_library_format(mod: graph_executor_factory.GraphExecutorFactory
 
     Parameters
     ----------
-    mod : tvm.relay.backend.graph_executor_factory.GraphExecutorFactoryModule
+    mod : tvm.relay.backend.executor_factory.ExecutorFactoryModule
         The return value of tvm.relay.build, which will be exported into Model Library Format.
     file_name : str
         Path to the .tar archive to generate.
     """
     tempdir = utils.tempdir()
+    is_aot = isinstance(mod, executor_factory.AOTExecutorFactoryModule)

Review comment:
       I think we should either update the docstring/type annotation or remove this logic.

##########
File path: src/relay/backend/aot_codegen.cc
##########
@@ -0,0 +1,704 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+/*!
+ * \file relay/backend/graph_codegen.cc
+ * \brief Graph runtime codegen
+ */
+
+#include <dmlc/any.h>
+#include <tvm/ir/module.h>
+#include <tvm/relay/expr_functor.h>
+#include <tvm/runtime/device_api.h>
+#include <tvm/tir/builtin.h>
+#include <tvm/tir/expr.h>
+#include <tvm/tir/stmt.h>
+
+#include <algorithm>
+#include <list>
+#include <string>
+#include <vector>
+
+#include "../../runtime/meta_data.h"
+#include "compile_engine.h"
+#include "utils.h"
+
+namespace tvm {
+namespace relay {
+namespace backend {
+
+using IntegerArray = Array<Integer>;
+using ShapeVector = std::vector<std::vector<int64_t>>;
+using GraphAttrs = std::unordered_map<std::string, dmlc::any>;
+using TargetsMap = std::unordered_map<int, Target>;
+
+/*! \brief Lowered outputs */
+struct AOTLoweredOutput {
+  tir::PrimFunc runner_func;
+  Map<String, IRModule> lowered_funcs;
+  Array<tvm::runtime::Module> external_mods;
+  std::unordered_map<std::string, std::pair<int, const tvm::runtime::NDArray>> params;
+  runtime::AOTMetadata aot_metadata;
+};
+
+class AotReturnSidVisitor : public ExprVisitor {
+ public:
+  explicit AotReturnSidVisitor(Map<Expr, Array<IntegerArray>> storage_device_map)
+      : storage_device_map_{storage_device_map}, return_sid_{-1} {}
+
+  IntegerArray FindReturnSid(Function func) {
+    VisitExpr(func->body);
+    return return_sid_;
+  }
+
+ protected:
+  void AssignReturnSid(Expr e) {
+    auto iter = storage_device_map_.find(e);
+    if (iter != storage_device_map_.end()) {
+      return_sid_ = (*iter).second[0];
+    }
+  }
+
+  void VisitExpr_(const ConstantNode* cn) override {
+    ExprVisitor::VisitExpr_(cn);
+    AssignReturnSid(GetRef<Expr>(cn));
+  }
+
+  void VisitExpr_(const VarNode* vn) override {
+    ExprVisitor::VisitExpr_(vn);
+    AssignReturnSid(GetRef<Expr>(vn));
+  }
+
+  void VisitExpr_(const CallNode* cn) override {
+    ExprVisitor::VisitExpr_(cn);
+    AssignReturnSid(GetRef<Expr>(cn));
+  }
+
+  void VisitExpr_(const LetNode* op) override { VisitExpr(op->body); }
+
+  void VisitExpr_(const TupleNode* tn) override {
+    ExprVisitor::VisitExpr_(tn);
+    AssignReturnSid(GetRef<Expr>(tn));
+  }
+
+ private:
+  Map<Expr, Array<IntegerArray>> storage_device_map_;
+  IntegerArray return_sid_;
+};
+
+/*! \brief Code generator for AOT executor */
+class AOTCodegen : public ExprVisitor {
+ protected:
+  /*!
+   * \brief Utility function to allocate a DLTensor or TVMValue
+   * \param  type the type of allocation
+   * \param num the number of variable to allocate on the stack
+   * \return PrimExpr representing the allocated object
+   */
+  PrimExpr StackAlloca(std::string type, size_t num) {
+    Array<PrimExpr> args = {tir::StringImm(type), ConstInt32(num)};
+    return tir::Call(DataType::Handle(), tir::builtin::tvm_stack_alloca(), args);
+  }
+
+  /*!
+   * \brief Utility function to allocate memory for storage identifiers
+   * \param  memory_size_byte size in bytes of the allocation
+   * \return PrimExpr representing the allocated memory
+   */
+  PrimExpr AllocateBackendMemory(int memory_size_byte) {
+    // TODO(giuseros): use tir::Allocate instead of TVMBackendAllocWorkspace
+    // to enable unified memory planning
+    static const Op& op = Op::Get("tir.TVMBackendAllocWorkspace");
+    return tvm::tir::Call(DataType::Handle(), op, {1, 0, memory_size_byte, 2, 8});
+  }
+
+  /*!
+   * \brief Utility function to convert a concrete integer to a PrimExpr.
+   * \param num the number to convert
+   * \return PrimExpr representing num
+   */
+  inline PrimExpr ConstInt32(size_t num) {
+    ICHECK_LE(num, std::numeric_limits<int>::max());
+    return tir::make_const(DataType::Int(32), static_cast<int>(num));
+  }
+
+  /*!
+   * \brief Return a vector of variables that represents the sids for the given Relay Expr
+   */
+  std::vector<tir::Var> pack_sid(Expr expr) {

Review comment:
       could you rename these to follow style guide, since they're not accessors/mutators? https://google.github.io/styleguide/cppguide.html#Function_Names
   
   e.g. PackSid or PackSid_

##########
File path: python/tvm/relay/build_module.py
##########
@@ -287,10 +299,19 @@ def build(ir_mod, target=None, target_host=None, params=None, mod_name="default"
 
     with tophub_context:
         bld_mod = BuildModule()
-        graph_json, runtime_mod, params = bld_mod.build(mod=ir_mod, target=target, params=params)
-        executor_factory = _graph_executor_factory.GraphExecutorFactoryModule(
-            ir_mod, target, graph_json, runtime_mod, mod_name, params
-        )
+        internal_repr, runtime_mod, params = bld_mod.build(mod=ir_mod, target=target, params=params)
+
+        if bld_mod.get_executor_type() == "aot":
+            executor_factory = _executor_factory.AOTExecutorFactoryModule(
+                ir_mod, target, internal_repr, runtime_mod, mod_name, params
+            )
+        elif bld_mod.get_executor_type() == "graph":
+            executor_factory = _executor_factory.GraphExecutorFactoryModule(
+                ir_mod, target, internal_repr, runtime_mod, mod_name, params
+            )
+        else:
+            assert False, "Executor not supported"

Review comment:
       could you include `build_mod.get_executor_type()` in the message?

##########
File path: src/relay/backend/aot_codegen.cc
##########
@@ -0,0 +1,704 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+/*!
+ * \file relay/backend/graph_codegen.cc
+ * \brief Graph runtime codegen
+ */
+
+#include <dmlc/any.h>
+#include <tvm/ir/module.h>
+#include <tvm/relay/expr_functor.h>
+#include <tvm/runtime/device_api.h>
+#include <tvm/tir/builtin.h>
+#include <tvm/tir/expr.h>
+#include <tvm/tir/stmt.h>
+
+#include <algorithm>
+#include <list>
+#include <string>
+#include <vector>
+
+#include "../../runtime/meta_data.h"
+#include "compile_engine.h"
+#include "utils.h"
+
+namespace tvm {
+namespace relay {
+namespace backend {
+
+using IntegerArray = Array<Integer>;
+using ShapeVector = std::vector<std::vector<int64_t>>;
+using GraphAttrs = std::unordered_map<std::string, dmlc::any>;
+using TargetsMap = std::unordered_map<int, Target>;
+
+/*! \brief Lowered outputs */
+struct AOTLoweredOutput {
+  tir::PrimFunc runner_func;
+  Map<String, IRModule> lowered_funcs;
+  Array<tvm::runtime::Module> external_mods;
+  std::unordered_map<std::string, std::pair<int, const tvm::runtime::NDArray>> params;
+  runtime::AOTMetadata aot_metadata;
+};
+
+class AotReturnSidVisitor : public ExprVisitor {
+ public:
+  explicit AotReturnSidVisitor(Map<Expr, Array<IntegerArray>> storage_device_map)
+      : storage_device_map_{storage_device_map}, return_sid_{-1} {}
+
+  IntegerArray FindReturnSid(Function func) {
+    VisitExpr(func->body);
+    return return_sid_;
+  }
+
+ protected:
+  void AssignReturnSid(Expr e) {
+    auto iter = storage_device_map_.find(e);
+    if (iter != storage_device_map_.end()) {
+      return_sid_ = (*iter).second[0];
+    }
+  }
+
+  void VisitExpr_(const ConstantNode* cn) override {
+    ExprVisitor::VisitExpr_(cn);
+    AssignReturnSid(GetRef<Expr>(cn));
+  }
+
+  void VisitExpr_(const VarNode* vn) override {
+    ExprVisitor::VisitExpr_(vn);
+    AssignReturnSid(GetRef<Expr>(vn));
+  }
+
+  void VisitExpr_(const CallNode* cn) override {
+    ExprVisitor::VisitExpr_(cn);
+    AssignReturnSid(GetRef<Expr>(cn));
+  }
+
+  void VisitExpr_(const LetNode* op) override { VisitExpr(op->body); }
+
+  void VisitExpr_(const TupleNode* tn) override {
+    ExprVisitor::VisitExpr_(tn);
+    AssignReturnSid(GetRef<Expr>(tn));
+  }
+
+ private:
+  Map<Expr, Array<IntegerArray>> storage_device_map_;
+  IntegerArray return_sid_;
+};
+
+/*! \brief Code generator for AOT executor */
+class AOTCodegen : public ExprVisitor {
+ protected:
+  /*!
+   * \brief Utility function to allocate a DLTensor or TVMValue
+   * \param  type the type of allocation
+   * \param num the number of variable to allocate on the stack
+   * \return PrimExpr representing the allocated object
+   */
+  PrimExpr StackAlloca(std::string type, size_t num) {
+    Array<PrimExpr> args = {tir::StringImm(type), ConstInt32(num)};
+    return tir::Call(DataType::Handle(), tir::builtin::tvm_stack_alloca(), args);
+  }
+
+  /*!
+   * \brief Utility function to allocate memory for storage identifiers
+   * \param  memory_size_byte size in bytes of the allocation
+   * \return PrimExpr representing the allocated memory
+   */
+  PrimExpr AllocateBackendMemory(int memory_size_byte) {
+    // TODO(giuseros): use tir::Allocate instead of TVMBackendAllocWorkspace
+    // to enable unified memory planning
+    static const Op& op = Op::Get("tir.TVMBackendAllocWorkspace");
+    return tvm::tir::Call(DataType::Handle(), op, {1, 0, memory_size_byte, 2, 8});
+  }
+
+  /*!
+   * \brief Utility function to convert a concrete integer to a PrimExpr.
+   * \param num the number to convert
+   * \return PrimExpr representing num
+   */
+  inline PrimExpr ConstInt32(size_t num) {
+    ICHECK_LE(num, std::numeric_limits<int>::max());
+    return tir::make_const(DataType::Int(32), static_cast<int>(num));
+  }
+
+  /*!
+   * \brief Return a vector of variables that represents the sids for the given Relay Expr
+   */
+  std::vector<tir::Var> pack_sid(Expr expr) {
+    Array<IntegerArray> sids = storage_device_map_[expr];
+    std::vector<tir::Var> sid_vars;
+
+    // Note that an expression can have multiple sids associated with it
+    // e.g., returning multiple values from a function
+    for (const auto& sid : sids[0]) {
+      // Determine if an sid is an output buffer
+      int sid_int = static_cast<int>((sid.as<IntImmNode>())->value);
+      auto output_iter = std::find(return_sid_.begin(), return_sid_.end(), sid_int);
+      if (output_iter != return_sid_.end()) {
+        int output_index = std::distance(return_sid_.begin(), output_iter);
+        sid_vars.push_back(main_signature_[input_vars_.size() + output_index]);
+        continue;
+      }
+      // Pack the sid inside the TVMValue
+      auto sid_array = te::Var(make_string("sid_", sid, "_value"), DataType::Handle());
+      auto sid_value = sids_table_[sid];
+      tvm::PrimExpr set_tensor =
+          tvm::tir::Call(DataType::Handle(), tvm::tir::builtin::tvm_struct_set(),
+                         {sid_array, 0, tir::builtin::kArrData, sid_value});
+      stmts_.push_back(tir::LetStmt(sid_array, StackAlloca("array", 1), tir::Evaluate(set_tensor)));
+      sid_vars.push_back(sid_array);
+    }
+    return sid_vars;
+  }
+
+  /*!
+   * \brief Utility function to return a parameter associated with an expression
+   * \param expr Relay Expression assicated with the parameter
+   * \return Variable that represents the DLTensor associated with the parameters
+   */
+  tir::Var pack_param(Expr expr) {
+    // TODO(giuseros): Using call_extern to call into lookup_linked_param. This is because the
+    // builtin::ret is not supported yet in the c target. Once return is supported we can use
+    // tvm_call_packed_lowered().
+    int param_sid = param_storage_ids_[params_by_expr_[expr]];
+    auto lookup_linked_param_fn = tir::StringImm(::tvm::runtime::symbol::tvm_lookup_linked_param);
+    auto param_array = te::Var(make_string("param_", param_sid, "_array"), DataType::Handle());
+
+    // Compose the lookup_call using a local stack
+    Array<tir::Stmt> lookup_call;
+    auto param_var = te::Var(make_string("param_", param_sid, "_value"), DataType::Handle());
+    auto ret_var = te::Var("ret_value", DataType::Handle());
+    auto ret_code = te::Var("ret_value", DataType::Handle());
+
+    lookup_call.push_back(tir::Evaluate(
+        tvm::tir::Call(DataType::Handle(), tvm::tir::builtin::tvm_struct_set(),
+                       {param_var, 0, tir::builtin::kTVMValueContent, ConstInt32(param_sid)})));
+    lookup_call.push_back(tir::Evaluate(
+        tvm::tir::Call(DataType::Handle(), tir::builtin::call_extern(),
+                       {lookup_linked_param_fn, param_var, 0, 0, ret_var, ret_code, 0})));
+    auto ret_var_handle = tvm::tir::Call(DataType::Handle(), tvm::tir::builtin::tvm_struct_get(),
+                                         {ret_var, 0, tir::builtin::kTVMValueContent});
+
+    // Set the param to the value returned by lookup_call
+    tvm::PrimExpr set_param_array =
+        tvm::tir::Call(DataType::Handle(), tvm::tir::builtin::tvm_struct_set(),
+                       {param_array, 0, tir::builtin::kArrData, ret_var_handle});
+    lookup_call.push_back(tir::Evaluate(set_param_array));
+
+    tir::Stmt lookup_body = tir::SeqStmt(lookup_call);
+
+    // Allocate the DLTensors on the stack
+    lookup_body = tir::LetStmt(param_var, StackAlloca("arg_value", 1), lookup_body);
+    lookup_body = tir::LetStmt(ret_var, StackAlloca("arg_value", 1), lookup_body);
+    lookup_body = tir::LetStmt(ret_code, StackAlloca("arg_value", 1), lookup_body);
+    lookup_body = tir::LetStmt(param_array, StackAlloca("arg_value", 1), lookup_body);
+    stmts_.push_back(lookup_body);
+    return param_array;
+  }
+
+  /*!
+   * brief Given an expression return the variable(s) associated with that expression
+   */
+  std::vector<te::Var> find_expr(Expr arg) {
+    auto input_iter = std::find(input_vars_.begin(), input_vars_.end(), arg);
+    if (input_iter != input_vars_.end()) {
+      // Input variable
+      int main_index = std::distance(input_vars_.begin(), input_iter);
+      return {main_signature_[main_index]};
+    } else if (params_by_expr_.find(arg) != params_by_expr_.end()) {
+      // Parameter of the network
+      return {pack_param(arg)};
+    } else {
+      // Storage identifier (i.e., intermediate memory)
+      return pack_sid(arg);
+    }
+  }
+
+  /*!
+   * brief Call a function with a given name
+   */
+  void func_call(Call call, std::string func_name) {
+    tvm::Array<PrimExpr> args{tvm::tir::StringImm(func_name)};
+    std::vector<tir::Stmt> func_call_stmts;
+
+    // Pack the inputs
+    for (Expr arg : call->args) {
+      auto var_arg = find_expr(arg);
+      args.push_back(var_arg[0]);
+    }
+
+    auto ret_expr = Downcast<Expr>(call);
+
+    // Pack the return(s) value. A call node can produce multiple outputs
+    for (const auto& var : pack_sid(ret_expr)) {
+      args.push_back(var);
+    }
+
+    // Use tvm_call_packed to execute the function
+    func_call_stmts.push_back(tir::Evaluate(
+        tvm::tir::Call(DataType::Int(32), tvm::tir::builtin::tvm_call_packed(), args)));
+    tir::Stmt body = tir::SeqStmt(func_call_stmts);
+    stmts_.push_back(body);
+  }
+
+  /*!
+   * brief Copy a variable to the output. This function is mainly used in edge cases
+   * when we want to return an input or a parameter.
+   */
+  void copy_to_output(te::Var out, te::Var in, size_t size) {
+    auto retval_get = tvm::tir::Call(DataType::Handle(), tvm::tir::builtin::tvm_struct_get(),
+                                     {in, 0, tir::builtin::kArrData});
+
+    // Define intermediate DLTensor to load/store the data
+    auto tmp0 = te::Var("tmp0", DataType::Handle());
+    auto tmp1 = te::Var("tmp1", DataType::Handle());
+    te::Var loop_idx("i", DataType::Int(32));
+    auto retval_i = tir::Load(DataType::UInt(8), tmp0, loop_idx, tir::const_true());
+    auto tostore = tvm::tir::Call(DataType::Handle(), tvm::tir::builtin::tvm_struct_get(),
+                                  {out, 0, tir::builtin::kArrData});
+
+    // Copy the variable from the input to the output
+    tir::Stmt copy = tir::For(
+        loop_idx, 0, ConstInt32(size), tir::ForKind::kSerial,
+        tir::Store(tmp1, tir::Let(tmp0, retval_get, retval_i), loop_idx, tir::const_true()));
+    stmts_.push_back(tir::LetStmt(tmp1, tostore, copy));
+  }
+
+  /*!
+   * Utility function to string together different arguments
+   */
+  template <typename... Args>
+  std::string make_string(Args const&... args) {
+    std::ostringstream ss;
+    using List = int[];
+    (void)List{0, ((void)(ss << args), 0)...};
+
+    return ss.str();
+  }
+
+  void VisitExpr_(const CallNode* op) override {
+    // Descend the call tree
+    for (auto arg : op->args) {
+      VisitExpr(arg);
+    }
+
+    Expr expr = GetRef<Expr>(op);
+    Function func;
+    if (op->op.as<OpNode>()) {
+      LOG(FATAL) << "Operators should be transformed away; try applying"
+                 << "the fuse_ops transformation to the expression.";
+    } else if (op->op.as<GlobalVarNode>()) {
+      LOG(FATAL) << "Not implemented";
+    } else if (op->op.as<FunctionNode>()) {
+      func = GetRef<Function>(op->op.as<FunctionNode>());
+    } else {
+      LOG(FATAL) << "TVM runtime does not support calls to " << op->op->GetTypeKey();
+    }
+    if (!func->HasNonzeroAttr(attr::kPrimitive)) {
+      LOG(FATAL) << "TVM only support calls to primitive functions "
+                 << "(i.e functions composed of fusable operator invocations)";
+    }
+
+    auto pf0 = GetPackedFunc("relay.backend._make_CCacheKey");
+    auto pf1 = GetPackedFunc("relay.backend._CompileEngineLower");
+    Target target;
+    // Handle external function
+    if (func->GetAttr<String>(attr::kCompiler).defined()) {
+      target = Target("ext_dev");
+      CCacheKey key = (*pf0)(func, target);
+      CachedFunc ext_func = (*pf1)(compile_engine_, key);
+      ICHECK(ext_func.defined()) << "External function is not defined.";
+      UpdateConstants(func, &params_);
+
+      // Generate the TIR function call
+      func_call(GetRef<Call>(op), ext_func->func_name);
+    }
+
+    ICHECK_GE(storage_device_map_.count(expr), 0);
+    auto& device_type = storage_device_map_[expr][1];
+    auto call_dev_type = device_type[0]->value;
+    // Normal Relay Function
+    if (targets_.size() == 1) {
+      // homogeneous execution.
+      const auto& it = targets_.begin();
+      target = (*it).second;
+    } else {
+      // heterogeneous execution.
+      std::string call_dev_name;
+      if (call_dev_type == 0) {
+        call_dev_name = "llvm";
+      } else {
+        call_dev_name = runtime::DeviceName(call_dev_type);
+      }
+      if (targets_.count(call_dev_type) == 0) {
+        LOG(FATAL) << "No target is provided for device " << call_dev_name;
+      }
+      target = targets_[call_dev_type];
+    }
+    CCacheKey key = (*pf0)(func, target);
+    CachedFunc lowered_func = (*pf1)(compile_engine_, key);
+    if (!lowered_funcs_.count(target->str())) {
+      lowered_funcs_[target->str()] = IRModule(Map<GlobalVar, BaseFunc>({}));
+    }
+    lowered_funcs_[target->str()]->Update(lowered_func->funcs);
+
+    // Generate the TIR function call
+    func_call(GetRef<Call>(op), lowered_func->func_name);
+  }
+
+  void VisitExpr_(const VarNode* op) override {
+    Expr expr = GetRef<Expr>(op);
+
+    // If the Var node is an output node we need to copy the content of the variable to the output
+    // It's safe to check the SID here because Var StorageToken are never reallocated
+    Array<IntegerArray> sids = storage_device_map_[expr];
+
+    auto output_iter = std::find(return_sid_.begin(), return_sid_.end(),
+                                 static_cast<int>((sids[0][0].as<IntImmNode>())->value));
+    if (output_iter != return_sid_.end()) {
+      int output_index = std::distance(return_sid_.begin(), output_iter);
+      auto var_expr = find_expr(expr);
+      copy_to_output(main_signature_[input_vars_.size() + output_index], var_expr[0], sids[2][0]);
+    }
+  }
+
+  void VisitExpr_(const ConstantNode* op) override {
+    Expr expr = GetRef<Expr>(op);
+    size_t index = params_.size();
+    std::string name = "p" + std::to_string(index);
+
+    param_storage_ids_[name] = storage_device_map_[expr][0][0]->value;
+    params_[name] = op->data;
+    params_by_expr_.Set(expr, name);
+
+    // If the Constant node is an output node we need to copy the content of the parameter to the
+    // output A Var node can only produce a single output
+    Array<IntegerArray> sids = storage_device_map_[expr];
+    auto output_iter = std::find(return_sid_.begin(), return_sid_.end(),
+                                 static_cast<int>((sids[0][0].as<IntImmNode>())->value));
+    if (output_iter != return_sid_.end()) {
+      int output_index = std::distance(return_sid_.begin(), output_iter);
+      copy_to_output(main_signature_[input_vars_.size() + output_index], pack_param(expr),
+                     sids[2][0]);
+    }
+  }
+
+  void VisitExpr_(const TupleNode* op) override {
+    for (auto field : op->fields) {
+      VisitExpr(field);
+    }
+  }
+
+  void VisitExpr_(const LetNode* op) override {
+    // TODO(giuseros): support Let nodes in AOT
+    CHECK(false) << "Let not yet implemented in AOT";
+  }
+  void VisitExpr_(const TupleGetItemNode* op) override { VisitExpr(op->tuple); }
+  void VisitExpr_(const OpNode* op) override {
+    throw std::runtime_error("can not compile op in non-eta expanded form");
+  }
+  void VisitExpr_(const GlobalVarNode* op) override { throw std::runtime_error(""); }
+  void VisitExpr_(const IfNode* op) override { throw std::invalid_argument("if not supported"); }
+  void VisitExpr_(const FunctionNode* op) override {
+    ICHECK(op->GetAttr<String>(attr::kCompiler).defined())
+        << "Only functions supported by custom codegen";
+  }
+  void VisitExpr_(const RefCreateNode* op) override {
+    throw std::invalid_argument("reference not supported");
+  }
+  void VisitExpr_(const RefReadNode* op) override {
+    throw std::invalid_argument("reference not supported");
+  }
+  void VisitExpr_(const RefWriteNode* op) override {
+    throw std::invalid_argument("reference not supported");
+  }
+  void VisitExpr_(const ConstructorNode* op) override {
+    throw std::invalid_argument("ADT constructor case not yet implemented");
+  }
+  void VisitExpr_(const MatchNode* op) override {
+    throw std::invalid_argument("match case not yet implemented");
+  }
+
+  // Create the main PrimFunc to execute the graph
+  tir::PrimFunc CreateMainFunc(unsigned int relay_params) {
+    tir::Stmt body = tir::SeqStmt(stmts_);
+
+    // Allocate the sids
+    std::unordered_map<int, bool> allocated;
+
+    for (auto kv : storage_device_map_) {
+      // Only allocate sids that are needed
+      const bool is_input =
+          (std::find(input_vars_.begin(), input_vars_.end(), kv.first) != input_vars_.end());
+      const bool is_param = (params_by_expr_.find(kv.first) != params_by_expr_.end());
+      if (is_input || is_param) {
+        continue;
+      }
+
+      for (unsigned int i = 0; i < kv.second[0].size(); i++) {
+        int size = kv.second[2][i];
+        int sid = static_cast<int>((kv.second[0][i].as<IntImmNode>())->value);
+
+        if (std::find(return_sid_.begin(), return_sid_.end(), sid) != return_sid_.end()) {
+          continue;
+        }
+
+        // TODO(giuseros): we should allocate this one time outside the PrimFunc
+        // so we dont' pay the price of allocation for every inference
+        if (!allocated[sid]) {
+          body = tir::LetStmt(sids_table_[sid], AllocateBackendMemory(size), body);
+        }
+        allocated[sid] = true;
+      }
+    }
+
+    // Define the attributes
+    body = tir::AttrStmt(PrimExpr(), tvm::tir::attr::device_type, 1, body);
+    body = tir::AttrStmt(PrimExpr(), tvm::tir::attr::device_id, 0, body);
+
+    // Make the PrimFunc
+    return tir::PrimFunc(main_signature_, body, VoidType(), Map<tir::Var, tir::Buffer>(),
+                         DictAttrs(dict_attrs_));
+  }
+
+ protected:
+  /*! \brief mod */
+  runtime::Module* mod_;
+  /*! \brief list of input expressions (i.e., variable passed by the user) */
+  std::vector<Expr> input_vars_;
+  /*! \brief input and output variables belonging to the main function signature */
+  Array<tir::Var> main_signature_;
+  /*! \brief target device */
+  TargetsMap targets_;
+  /*! \brief target host */
+  Target target_host_;
+  /*! PrimFunc attributes */
+  Map<String, ObjectRef> dict_attrs_;
+
+  /*!
+   * \brief parameters (i.e. ConstantNodes found in the graph).
+   * These are take as inputs to the GraphRuntime.
+   * Maps param name to a pair of storage_id and NDArray. At runtime, the storage_id can be
+   * used to lookup the parameter.
+   */
+  std::unordered_map<std::string, runtime::NDArray> params_;
+  /*! \brief mapping between expression and parameters */
+  Map<Expr, String> params_by_expr_;
+  /*! \brief mapping between parameter names ("p0", "p1", etc..) and storage identifiers*/
+  std::unordered_map<std::string, int64_t> param_storage_ids_;
+
+  /*! \brief plan memory of device result */
+  Map<Expr, Array<IntegerArray>> storage_device_map_;
+  std::unordered_map<int, te::Var> sids_table_;
+  /*! \brief lowered funcs */
+  std::unordered_map<std::string, IRModule> lowered_funcs_;
+  /*! \brief name map */
+  std::unordered_map<std::string, size_t> name_map_;
+  /*! \brief compile engine */
+  CompileEngine compile_engine_;
+  /*! \brief GraphPlanMemory module */
+  runtime::Module graph_plan_memory_module_;
+  /*! \brief the IR module stored which represents the executor program */
+  Map<String, IRModule> tir_module_;

Review comment:
       is this used?

##########
File path: src/relay/backend/aot_codegen.cc
##########
@@ -0,0 +1,704 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+/*!
+ * \file relay/backend/graph_codegen.cc
+ * \brief Graph runtime codegen
+ */
+
+#include <dmlc/any.h>
+#include <tvm/ir/module.h>
+#include <tvm/relay/expr_functor.h>
+#include <tvm/runtime/device_api.h>
+#include <tvm/tir/builtin.h>
+#include <tvm/tir/expr.h>
+#include <tvm/tir/stmt.h>
+
+#include <algorithm>
+#include <list>
+#include <string>
+#include <vector>
+
+#include "../../runtime/meta_data.h"
+#include "compile_engine.h"
+#include "utils.h"
+
+namespace tvm {
+namespace relay {
+namespace backend {
+
+using IntegerArray = Array<Integer>;
+using ShapeVector = std::vector<std::vector<int64_t>>;
+using GraphAttrs = std::unordered_map<std::string, dmlc::any>;
+using TargetsMap = std::unordered_map<int, Target>;
+
+/*! \brief Lowered outputs */
+struct AOTLoweredOutput {
+  tir::PrimFunc runner_func;
+  Map<String, IRModule> lowered_funcs;
+  Array<tvm::runtime::Module> external_mods;
+  std::unordered_map<std::string, std::pair<int, const tvm::runtime::NDArray>> params;
+  runtime::AOTMetadata aot_metadata;
+};
+
+class AotReturnSidVisitor : public ExprVisitor {
+ public:
+  explicit AotReturnSidVisitor(Map<Expr, Array<IntegerArray>> storage_device_map)
+      : storage_device_map_{storage_device_map}, return_sid_{-1} {}
+
+  IntegerArray FindReturnSid(Function func) {
+    VisitExpr(func->body);
+    return return_sid_;
+  }
+
+ protected:
+  void AssignReturnSid(Expr e) {
+    auto iter = storage_device_map_.find(e);
+    if (iter != storage_device_map_.end()) {
+      return_sid_ = (*iter).second[0];
+    }
+  }
+
+  void VisitExpr_(const ConstantNode* cn) override {
+    ExprVisitor::VisitExpr_(cn);
+    AssignReturnSid(GetRef<Expr>(cn));
+  }
+
+  void VisitExpr_(const VarNode* vn) override {
+    ExprVisitor::VisitExpr_(vn);
+    AssignReturnSid(GetRef<Expr>(vn));
+  }
+
+  void VisitExpr_(const CallNode* cn) override {
+    ExprVisitor::VisitExpr_(cn);
+    AssignReturnSid(GetRef<Expr>(cn));
+  }
+
+  void VisitExpr_(const LetNode* op) override { VisitExpr(op->body); }
+
+  void VisitExpr_(const TupleNode* tn) override {
+    ExprVisitor::VisitExpr_(tn);
+    AssignReturnSid(GetRef<Expr>(tn));
+  }
+
+ private:
+  Map<Expr, Array<IntegerArray>> storage_device_map_;
+  IntegerArray return_sid_;
+};
+
+/*! \brief Code generator for AOT executor */
+class AOTCodegen : public ExprVisitor {
+ protected:
+  /*!
+   * \brief Utility function to allocate a DLTensor or TVMValue
+   * \param  type the type of allocation
+   * \param num the number of variable to allocate on the stack
+   * \return PrimExpr representing the allocated object
+   */
+  PrimExpr StackAlloca(std::string type, size_t num) {
+    Array<PrimExpr> args = {tir::StringImm(type), ConstInt32(num)};
+    return tir::Call(DataType::Handle(), tir::builtin::tvm_stack_alloca(), args);
+  }
+
+  /*!
+   * \brief Utility function to allocate memory for storage identifiers
+   * \param  memory_size_byte size in bytes of the allocation
+   * \return PrimExpr representing the allocated memory
+   */
+  PrimExpr AllocateBackendMemory(int memory_size_byte) {
+    // TODO(giuseros): use tir::Allocate instead of TVMBackendAllocWorkspace
+    // to enable unified memory planning
+    static const Op& op = Op::Get("tir.TVMBackendAllocWorkspace");
+    return tvm::tir::Call(DataType::Handle(), op, {1, 0, memory_size_byte, 2, 8});
+  }
+
+  /*!
+   * \brief Utility function to convert a concrete integer to a PrimExpr.
+   * \param num the number to convert
+   * \return PrimExpr representing num
+   */
+  inline PrimExpr ConstInt32(size_t num) {
+    ICHECK_LE(num, std::numeric_limits<int>::max());
+    return tir::make_const(DataType::Int(32), static_cast<int>(num));
+  }
+
+  /*!
+   * \brief Return a vector of variables that represents the sids for the given Relay Expr
+   */
+  std::vector<tir::Var> pack_sid(Expr expr) {
+    Array<IntegerArray> sids = storage_device_map_[expr];
+    std::vector<tir::Var> sid_vars;
+
+    // Note that an expression can have multiple sids associated with it
+    // e.g., returning multiple values from a function
+    for (const auto& sid : sids[0]) {
+      // Determine if an sid is an output buffer
+      int sid_int = static_cast<int>((sid.as<IntImmNode>())->value);
+      auto output_iter = std::find(return_sid_.begin(), return_sid_.end(), sid_int);
+      if (output_iter != return_sid_.end()) {
+        int output_index = std::distance(return_sid_.begin(), output_iter);
+        sid_vars.push_back(main_signature_[input_vars_.size() + output_index]);
+        continue;
+      }
+      // Pack the sid inside the TVMValue
+      auto sid_array = te::Var(make_string("sid_", sid, "_value"), DataType::Handle());
+      auto sid_value = sids_table_[sid];
+      tvm::PrimExpr set_tensor =
+          tvm::tir::Call(DataType::Handle(), tvm::tir::builtin::tvm_struct_set(),
+                         {sid_array, 0, tir::builtin::kArrData, sid_value});
+      stmts_.push_back(tir::LetStmt(sid_array, StackAlloca("array", 1), tir::Evaluate(set_tensor)));
+      sid_vars.push_back(sid_array);
+    }
+    return sid_vars;
+  }
+
+  /*!
+   * \brief Utility function to return a parameter associated with an expression
+   * \param expr Relay Expression assicated with the parameter
+   * \return Variable that represents the DLTensor associated with the parameters
+   */
+  tir::Var pack_param(Expr expr) {
+    // TODO(giuseros): Using call_extern to call into lookup_linked_param. This is because the
+    // builtin::ret is not supported yet in the c target. Once return is supported we can use
+    // tvm_call_packed_lowered().
+    int param_sid = param_storage_ids_[params_by_expr_[expr]];
+    auto lookup_linked_param_fn = tir::StringImm(::tvm::runtime::symbol::tvm_lookup_linked_param);
+    auto param_array = te::Var(make_string("param_", param_sid, "_array"), DataType::Handle());
+
+    // Compose the lookup_call using a local stack
+    Array<tir::Stmt> lookup_call;
+    auto param_var = te::Var(make_string("param_", param_sid, "_value"), DataType::Handle());
+    auto ret_var = te::Var("ret_value", DataType::Handle());
+    auto ret_code = te::Var("ret_value", DataType::Handle());
+
+    lookup_call.push_back(tir::Evaluate(
+        tvm::tir::Call(DataType::Handle(), tvm::tir::builtin::tvm_struct_set(),
+                       {param_var, 0, tir::builtin::kTVMValueContent, ConstInt32(param_sid)})));
+    lookup_call.push_back(tir::Evaluate(
+        tvm::tir::Call(DataType::Handle(), tir::builtin::call_extern(),
+                       {lookup_linked_param_fn, param_var, 0, 0, ret_var, ret_code, 0})));
+    auto ret_var_handle = tvm::tir::Call(DataType::Handle(), tvm::tir::builtin::tvm_struct_get(),
+                                         {ret_var, 0, tir::builtin::kTVMValueContent});
+
+    // Set the param to the value returned by lookup_call
+    tvm::PrimExpr set_param_array =
+        tvm::tir::Call(DataType::Handle(), tvm::tir::builtin::tvm_struct_set(),
+                       {param_array, 0, tir::builtin::kArrData, ret_var_handle});
+    lookup_call.push_back(tir::Evaluate(set_param_array));
+
+    tir::Stmt lookup_body = tir::SeqStmt(lookup_call);
+
+    // Allocate the DLTensors on the stack
+    lookup_body = tir::LetStmt(param_var, StackAlloca("arg_value", 1), lookup_body);
+    lookup_body = tir::LetStmt(ret_var, StackAlloca("arg_value", 1), lookup_body);
+    lookup_body = tir::LetStmt(ret_code, StackAlloca("arg_value", 1), lookup_body);
+    lookup_body = tir::LetStmt(param_array, StackAlloca("arg_value", 1), lookup_body);
+    stmts_.push_back(lookup_body);
+    return param_array;
+  }
+
+  /*!
+   * brief Given an expression return the variable(s) associated with that expression
+   */
+  std::vector<te::Var> find_expr(Expr arg) {
+    auto input_iter = std::find(input_vars_.begin(), input_vars_.end(), arg);
+    if (input_iter != input_vars_.end()) {
+      // Input variable
+      int main_index = std::distance(input_vars_.begin(), input_iter);
+      return {main_signature_[main_index]};
+    } else if (params_by_expr_.find(arg) != params_by_expr_.end()) {
+      // Parameter of the network
+      return {pack_param(arg)};
+    } else {
+      // Storage identifier (i.e., intermediate memory)
+      return pack_sid(arg);
+    }
+  }
+
+  /*!
+   * brief Call a function with a given name
+   */
+  void func_call(Call call, std::string func_name) {
+    tvm::Array<PrimExpr> args{tvm::tir::StringImm(func_name)};
+    std::vector<tir::Stmt> func_call_stmts;
+
+    // Pack the inputs
+    for (Expr arg : call->args) {
+      auto var_arg = find_expr(arg);
+      args.push_back(var_arg[0]);
+    }
+
+    auto ret_expr = Downcast<Expr>(call);
+
+    // Pack the return(s) value. A call node can produce multiple outputs
+    for (const auto& var : pack_sid(ret_expr)) {
+      args.push_back(var);
+    }
+
+    // Use tvm_call_packed to execute the function
+    func_call_stmts.push_back(tir::Evaluate(
+        tvm::tir::Call(DataType::Int(32), tvm::tir::builtin::tvm_call_packed(), args)));
+    tir::Stmt body = tir::SeqStmt(func_call_stmts);
+    stmts_.push_back(body);
+  }
+
+  /*!
+   * brief Copy a variable to the output. This function is mainly used in edge cases
+   * when we want to return an input or a parameter.
+   */
+  void copy_to_output(te::Var out, te::Var in, size_t size) {
+    auto retval_get = tvm::tir::Call(DataType::Handle(), tvm::tir::builtin::tvm_struct_get(),
+                                     {in, 0, tir::builtin::kArrData});
+
+    // Define intermediate DLTensor to load/store the data
+    auto tmp0 = te::Var("tmp0", DataType::Handle());
+    auto tmp1 = te::Var("tmp1", DataType::Handle());
+    te::Var loop_idx("i", DataType::Int(32));
+    auto retval_i = tir::Load(DataType::UInt(8), tmp0, loop_idx, tir::const_true());
+    auto tostore = tvm::tir::Call(DataType::Handle(), tvm::tir::builtin::tvm_struct_get(),
+                                  {out, 0, tir::builtin::kArrData});
+
+    // Copy the variable from the input to the output
+    tir::Stmt copy = tir::For(
+        loop_idx, 0, ConstInt32(size), tir::ForKind::kSerial,
+        tir::Store(tmp1, tir::Let(tmp0, retval_get, retval_i), loop_idx, tir::const_true()));
+    stmts_.push_back(tir::LetStmt(tmp1, tostore, copy));
+  }
+
+  /*!
+   * Utility function to string together different arguments
+   */
+  template <typename... Args>
+  std::string make_string(Args const&... args) {
+    std::ostringstream ss;
+    using List = int[];
+    (void)List{0, ((void)(ss << args), 0)...};
+
+    return ss.str();
+  }
+
+  void VisitExpr_(const CallNode* op) override {
+    // Descend the call tree
+    for (auto arg : op->args) {
+      VisitExpr(arg);
+    }
+
+    Expr expr = GetRef<Expr>(op);
+    Function func;
+    if (op->op.as<OpNode>()) {
+      LOG(FATAL) << "Operators should be transformed away; try applying"
+                 << "the fuse_ops transformation to the expression.";
+    } else if (op->op.as<GlobalVarNode>()) {
+      LOG(FATAL) << "Not implemented";
+    } else if (op->op.as<FunctionNode>()) {
+      func = GetRef<Function>(op->op.as<FunctionNode>());
+    } else {
+      LOG(FATAL) << "TVM runtime does not support calls to " << op->op->GetTypeKey();
+    }
+    if (!func->HasNonzeroAttr(attr::kPrimitive)) {
+      LOG(FATAL) << "TVM only support calls to primitive functions "
+                 << "(i.e functions composed of fusable operator invocations)";
+    }
+
+    auto pf0 = GetPackedFunc("relay.backend._make_CCacheKey");
+    auto pf1 = GetPackedFunc("relay.backend._CompileEngineLower");
+    Target target;
+    // Handle external function
+    if (func->GetAttr<String>(attr::kCompiler).defined()) {
+      target = Target("ext_dev");
+      CCacheKey key = (*pf0)(func, target);
+      CachedFunc ext_func = (*pf1)(compile_engine_, key);
+      ICHECK(ext_func.defined()) << "External function is not defined.";
+      UpdateConstants(func, &params_);
+
+      // Generate the TIR function call
+      func_call(GetRef<Call>(op), ext_func->func_name);
+    }
+
+    ICHECK_GE(storage_device_map_.count(expr), 0);
+    auto& device_type = storage_device_map_[expr][1];
+    auto call_dev_type = device_type[0]->value;
+    // Normal Relay Function
+    if (targets_.size() == 1) {
+      // homogeneous execution.
+      const auto& it = targets_.begin();
+      target = (*it).second;
+    } else {
+      // heterogeneous execution.
+      std::string call_dev_name;
+      if (call_dev_type == 0) {
+        call_dev_name = "llvm";
+      } else {
+        call_dev_name = runtime::DeviceName(call_dev_type);
+      }
+      if (targets_.count(call_dev_type) == 0) {
+        LOG(FATAL) << "No target is provided for device " << call_dev_name;
+      }
+      target = targets_[call_dev_type];
+    }
+    CCacheKey key = (*pf0)(func, target);
+    CachedFunc lowered_func = (*pf1)(compile_engine_, key);
+    if (!lowered_funcs_.count(target->str())) {
+      lowered_funcs_[target->str()] = IRModule(Map<GlobalVar, BaseFunc>({}));
+    }
+    lowered_funcs_[target->str()]->Update(lowered_func->funcs);
+
+    // Generate the TIR function call
+    func_call(GetRef<Call>(op), lowered_func->func_name);
+  }
+
+  void VisitExpr_(const VarNode* op) override {
+    Expr expr = GetRef<Expr>(op);
+
+    // If the Var node is an output node we need to copy the content of the variable to the output
+    // It's safe to check the SID here because Var StorageToken are never reallocated
+    Array<IntegerArray> sids = storage_device_map_[expr];
+
+    auto output_iter = std::find(return_sid_.begin(), return_sid_.end(),
+                                 static_cast<int>((sids[0][0].as<IntImmNode>())->value));
+    if (output_iter != return_sid_.end()) {
+      int output_index = std::distance(return_sid_.begin(), output_iter);
+      auto var_expr = find_expr(expr);
+      copy_to_output(main_signature_[input_vars_.size() + output_index], var_expr[0], sids[2][0]);
+    }
+  }
+
+  void VisitExpr_(const ConstantNode* op) override {
+    Expr expr = GetRef<Expr>(op);
+    size_t index = params_.size();
+    std::string name = "p" + std::to_string(index);
+
+    param_storage_ids_[name] = storage_device_map_[expr][0][0]->value;
+    params_[name] = op->data;
+    params_by_expr_.Set(expr, name);
+
+    // If the Constant node is an output node we need to copy the content of the parameter to the
+    // output A Var node can only produce a single output
+    Array<IntegerArray> sids = storage_device_map_[expr];
+    auto output_iter = std::find(return_sid_.begin(), return_sid_.end(),
+                                 static_cast<int>((sids[0][0].as<IntImmNode>())->value));
+    if (output_iter != return_sid_.end()) {
+      int output_index = std::distance(return_sid_.begin(), output_iter);
+      copy_to_output(main_signature_[input_vars_.size() + output_index], pack_param(expr),
+                     sids[2][0]);
+    }
+  }
+
+  void VisitExpr_(const TupleNode* op) override {
+    for (auto field : op->fields) {
+      VisitExpr(field);
+    }
+  }
+
+  void VisitExpr_(const LetNode* op) override {
+    // TODO(giuseros): support Let nodes in AOT
+    CHECK(false) << "Let not yet implemented in AOT";
+  }
+  void VisitExpr_(const TupleGetItemNode* op) override { VisitExpr(op->tuple); }
+  void VisitExpr_(const OpNode* op) override {
+    throw std::runtime_error("can not compile op in non-eta expanded form");
+  }
+  void VisitExpr_(const GlobalVarNode* op) override { throw std::runtime_error(""); }
+  void VisitExpr_(const IfNode* op) override { throw std::invalid_argument("if not supported"); }
+  void VisitExpr_(const FunctionNode* op) override {
+    ICHECK(op->GetAttr<String>(attr::kCompiler).defined())
+        << "Only functions supported by custom codegen";
+  }
+  void VisitExpr_(const RefCreateNode* op) override {
+    throw std::invalid_argument("reference not supported");
+  }
+  void VisitExpr_(const RefReadNode* op) override {
+    throw std::invalid_argument("reference not supported");
+  }
+  void VisitExpr_(const RefWriteNode* op) override {
+    throw std::invalid_argument("reference not supported");
+  }
+  void VisitExpr_(const ConstructorNode* op) override {
+    throw std::invalid_argument("ADT constructor case not yet implemented");
+  }
+  void VisitExpr_(const MatchNode* op) override {
+    throw std::invalid_argument("match case not yet implemented");
+  }
+
+  // Create the main PrimFunc to execute the graph
+  tir::PrimFunc CreateMainFunc(unsigned int relay_params) {
+    tir::Stmt body = tir::SeqStmt(stmts_);
+
+    // Allocate the sids
+    std::unordered_map<int, bool> allocated;
+
+    for (auto kv : storage_device_map_) {
+      // Only allocate sids that are needed
+      const bool is_input =
+          (std::find(input_vars_.begin(), input_vars_.end(), kv.first) != input_vars_.end());
+      const bool is_param = (params_by_expr_.find(kv.first) != params_by_expr_.end());
+      if (is_input || is_param) {
+        continue;
+      }
+
+      for (unsigned int i = 0; i < kv.second[0].size(); i++) {
+        int size = kv.second[2][i];
+        int sid = static_cast<int>((kv.second[0][i].as<IntImmNode>())->value);
+
+        if (std::find(return_sid_.begin(), return_sid_.end(), sid) != return_sid_.end()) {
+          continue;
+        }
+
+        // TODO(giuseros): we should allocate this one time outside the PrimFunc
+        // so we dont' pay the price of allocation for every inference
+        if (!allocated[sid]) {
+          body = tir::LetStmt(sids_table_[sid], AllocateBackendMemory(size), body);
+        }
+        allocated[sid] = true;
+      }
+    }
+
+    // Define the attributes
+    body = tir::AttrStmt(PrimExpr(), tvm::tir::attr::device_type, 1, body);
+    body = tir::AttrStmt(PrimExpr(), tvm::tir::attr::device_id, 0, body);
+
+    // Make the PrimFunc
+    return tir::PrimFunc(main_signature_, body, VoidType(), Map<tir::Var, tir::Buffer>(),
+                         DictAttrs(dict_attrs_));
+  }
+
+ protected:
+  /*! \brief mod */
+  runtime::Module* mod_;
+  /*! \brief list of input expressions (i.e., variable passed by the user) */
+  std::vector<Expr> input_vars_;
+  /*! \brief input and output variables belonging to the main function signature */
+  Array<tir::Var> main_signature_;
+  /*! \brief target device */
+  TargetsMap targets_;
+  /*! \brief target host */
+  Target target_host_;
+  /*! PrimFunc attributes */
+  Map<String, ObjectRef> dict_attrs_;
+
+  /*!
+   * \brief parameters (i.e. ConstantNodes found in the graph).
+   * These are take as inputs to the GraphRuntime.
+   * Maps param name to a pair of storage_id and NDArray. At runtime, the storage_id can be
+   * used to lookup the parameter.
+   */
+  std::unordered_map<std::string, runtime::NDArray> params_;
+  /*! \brief mapping between expression and parameters */
+  Map<Expr, String> params_by_expr_;
+  /*! \brief mapping between parameter names ("p0", "p1", etc..) and storage identifiers*/
+  std::unordered_map<std::string, int64_t> param_storage_ids_;
+
+  /*! \brief plan memory of device result */
+  Map<Expr, Array<IntegerArray>> storage_device_map_;
+  std::unordered_map<int, te::Var> sids_table_;
+  /*! \brief lowered funcs */
+  std::unordered_map<std::string, IRModule> lowered_funcs_;
+  /*! \brief name map */
+  std::unordered_map<std::string, size_t> name_map_;

Review comment:
       is this used?

##########
File path: src/relay/backend/aot_codegen.cc
##########
@@ -0,0 +1,704 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+/*!
+ * \file relay/backend/graph_codegen.cc
+ * \brief Graph runtime codegen
+ */
+
+#include <dmlc/any.h>
+#include <tvm/ir/module.h>
+#include <tvm/relay/expr_functor.h>
+#include <tvm/runtime/device_api.h>
+#include <tvm/tir/builtin.h>
+#include <tvm/tir/expr.h>
+#include <tvm/tir/stmt.h>
+
+#include <algorithm>
+#include <list>
+#include <string>
+#include <vector>
+
+#include "../../runtime/meta_data.h"
+#include "compile_engine.h"
+#include "utils.h"
+
+namespace tvm {
+namespace relay {
+namespace backend {
+
+using IntegerArray = Array<Integer>;
+using ShapeVector = std::vector<std::vector<int64_t>>;
+using GraphAttrs = std::unordered_map<std::string, dmlc::any>;
+using TargetsMap = std::unordered_map<int, Target>;
+
+/*! \brief Lowered outputs */
+struct AOTLoweredOutput {
+  tir::PrimFunc runner_func;
+  Map<String, IRModule> lowered_funcs;
+  Array<tvm::runtime::Module> external_mods;
+  std::unordered_map<std::string, std::pair<int, const tvm::runtime::NDArray>> params;
+  runtime::AOTMetadata aot_metadata;
+};
+
+class AotReturnSidVisitor : public ExprVisitor {
+ public:
+  explicit AotReturnSidVisitor(Map<Expr, Array<IntegerArray>> storage_device_map)
+      : storage_device_map_{storage_device_map}, return_sid_{-1} {}
+
+  IntegerArray FindReturnSid(Function func) {
+    VisitExpr(func->body);
+    return return_sid_;
+  }
+
+ protected:
+  void AssignReturnSid(Expr e) {
+    auto iter = storage_device_map_.find(e);
+    if (iter != storage_device_map_.end()) {
+      return_sid_ = (*iter).second[0];
+    }
+  }
+
+  void VisitExpr_(const ConstantNode* cn) override {
+    ExprVisitor::VisitExpr_(cn);
+    AssignReturnSid(GetRef<Expr>(cn));
+  }
+
+  void VisitExpr_(const VarNode* vn) override {
+    ExprVisitor::VisitExpr_(vn);
+    AssignReturnSid(GetRef<Expr>(vn));
+  }
+
+  void VisitExpr_(const CallNode* cn) override {
+    ExprVisitor::VisitExpr_(cn);
+    AssignReturnSid(GetRef<Expr>(cn));
+  }
+
+  void VisitExpr_(const LetNode* op) override { VisitExpr(op->body); }
+
+  void VisitExpr_(const TupleNode* tn) override {
+    ExprVisitor::VisitExpr_(tn);
+    AssignReturnSid(GetRef<Expr>(tn));
+  }
+
+ private:
+  Map<Expr, Array<IntegerArray>> storage_device_map_;
+  IntegerArray return_sid_;
+};
+
+/*! \brief Code generator for AOT executor */
+class AOTCodegen : public ExprVisitor {
+ protected:
+  /*!
+   * \brief Utility function to allocate a DLTensor or TVMValue
+   * \param  type the type of allocation
+   * \param num the number of variable to allocate on the stack
+   * \return PrimExpr representing the allocated object
+   */
+  PrimExpr StackAlloca(std::string type, size_t num) {
+    Array<PrimExpr> args = {tir::StringImm(type), ConstInt32(num)};
+    return tir::Call(DataType::Handle(), tir::builtin::tvm_stack_alloca(), args);
+  }
+
+  /*!
+   * \brief Utility function to allocate memory for storage identifiers
+   * \param  memory_size_byte size in bytes of the allocation
+   * \return PrimExpr representing the allocated memory
+   */
+  PrimExpr AllocateBackendMemory(int memory_size_byte) {
+    // TODO(giuseros): use tir::Allocate instead of TVMBackendAllocWorkspace
+    // to enable unified memory planning
+    static const Op& op = Op::Get("tir.TVMBackendAllocWorkspace");
+    return tvm::tir::Call(DataType::Handle(), op, {1, 0, memory_size_byte, 2, 8});
+  }
+
+  /*!
+   * \brief Utility function to convert a concrete integer to a PrimExpr.
+   * \param num the number to convert
+   * \return PrimExpr representing num
+   */
+  inline PrimExpr ConstInt32(size_t num) {
+    ICHECK_LE(num, std::numeric_limits<int>::max());
+    return tir::make_const(DataType::Int(32), static_cast<int>(num));
+  }
+
+  /*!
+   * \brief Return a vector of variables that represents the sids for the given Relay Expr
+   */
+  std::vector<tir::Var> pack_sid(Expr expr) {
+    Array<IntegerArray> sids = storage_device_map_[expr];
+    std::vector<tir::Var> sid_vars;
+
+    // Note that an expression can have multiple sids associated with it
+    // e.g., returning multiple values from a function
+    for (const auto& sid : sids[0]) {
+      // Determine if an sid is an output buffer
+      int sid_int = static_cast<int>((sid.as<IntImmNode>())->value);
+      auto output_iter = std::find(return_sid_.begin(), return_sid_.end(), sid_int);
+      if (output_iter != return_sid_.end()) {
+        int output_index = std::distance(return_sid_.begin(), output_iter);
+        sid_vars.push_back(main_signature_[input_vars_.size() + output_index]);
+        continue;
+      }
+      // Pack the sid inside the TVMValue
+      auto sid_array = te::Var(make_string("sid_", sid, "_value"), DataType::Handle());
+      auto sid_value = sids_table_[sid];
+      tvm::PrimExpr set_tensor =
+          tvm::tir::Call(DataType::Handle(), tvm::tir::builtin::tvm_struct_set(),
+                         {sid_array, 0, tir::builtin::kArrData, sid_value});
+      stmts_.push_back(tir::LetStmt(sid_array, StackAlloca("array", 1), tir::Evaluate(set_tensor)));
+      sid_vars.push_back(sid_array);
+    }
+    return sid_vars;
+  }
+
+  /*!
+   * \brief Utility function to return a parameter associated with an expression
+   * \param expr Relay Expression assicated with the parameter
+   * \return Variable that represents the DLTensor associated with the parameters
+   */
+  tir::Var pack_param(Expr expr) {
+    // TODO(giuseros): Using call_extern to call into lookup_linked_param. This is because the
+    // builtin::ret is not supported yet in the c target. Once return is supported we can use
+    // tvm_call_packed_lowered().
+    int param_sid = param_storage_ids_[params_by_expr_[expr]];
+    auto lookup_linked_param_fn = tir::StringImm(::tvm::runtime::symbol::tvm_lookup_linked_param);
+    auto param_array = te::Var(make_string("param_", param_sid, "_array"), DataType::Handle());
+
+    // Compose the lookup_call using a local stack
+    Array<tir::Stmt> lookup_call;
+    auto param_var = te::Var(make_string("param_", param_sid, "_value"), DataType::Handle());
+    auto ret_var = te::Var("ret_value", DataType::Handle());
+    auto ret_code = te::Var("ret_value", DataType::Handle());
+
+    lookup_call.push_back(tir::Evaluate(
+        tvm::tir::Call(DataType::Handle(), tvm::tir::builtin::tvm_struct_set(),
+                       {param_var, 0, tir::builtin::kTVMValueContent, ConstInt32(param_sid)})));
+    lookup_call.push_back(tir::Evaluate(
+        tvm::tir::Call(DataType::Handle(), tir::builtin::call_extern(),
+                       {lookup_linked_param_fn, param_var, 0, 0, ret_var, ret_code, 0})));
+    auto ret_var_handle = tvm::tir::Call(DataType::Handle(), tvm::tir::builtin::tvm_struct_get(),
+                                         {ret_var, 0, tir::builtin::kTVMValueContent});
+
+    // Set the param to the value returned by lookup_call
+    tvm::PrimExpr set_param_array =
+        tvm::tir::Call(DataType::Handle(), tvm::tir::builtin::tvm_struct_set(),
+                       {param_array, 0, tir::builtin::kArrData, ret_var_handle});
+    lookup_call.push_back(tir::Evaluate(set_param_array));
+
+    tir::Stmt lookup_body = tir::SeqStmt(lookup_call);
+
+    // Allocate the DLTensors on the stack
+    lookup_body = tir::LetStmt(param_var, StackAlloca("arg_value", 1), lookup_body);
+    lookup_body = tir::LetStmt(ret_var, StackAlloca("arg_value", 1), lookup_body);
+    lookup_body = tir::LetStmt(ret_code, StackAlloca("arg_value", 1), lookup_body);
+    lookup_body = tir::LetStmt(param_array, StackAlloca("arg_value", 1), lookup_body);
+    stmts_.push_back(lookup_body);
+    return param_array;
+  }
+
+  /*!
+   * brief Given an expression return the variable(s) associated with that expression
+   */
+  std::vector<te::Var> find_expr(Expr arg) {
+    auto input_iter = std::find(input_vars_.begin(), input_vars_.end(), arg);
+    if (input_iter != input_vars_.end()) {
+      // Input variable
+      int main_index = std::distance(input_vars_.begin(), input_iter);
+      return {main_signature_[main_index]};
+    } else if (params_by_expr_.find(arg) != params_by_expr_.end()) {
+      // Parameter of the network
+      return {pack_param(arg)};
+    } else {
+      // Storage identifier (i.e., intermediate memory)
+      return pack_sid(arg);
+    }
+  }
+
+  /*!
+   * brief Call a function with a given name
+   */
+  void func_call(Call call, std::string func_name) {
+    tvm::Array<PrimExpr> args{tvm::tir::StringImm(func_name)};
+    std::vector<tir::Stmt> func_call_stmts;
+
+    // Pack the inputs
+    for (Expr arg : call->args) {
+      auto var_arg = find_expr(arg);
+      args.push_back(var_arg[0]);
+    }
+
+    auto ret_expr = Downcast<Expr>(call);
+
+    // Pack the return(s) value. A call node can produce multiple outputs
+    for (const auto& var : pack_sid(ret_expr)) {
+      args.push_back(var);
+    }
+
+    // Use tvm_call_packed to execute the function
+    func_call_stmts.push_back(tir::Evaluate(
+        tvm::tir::Call(DataType::Int(32), tvm::tir::builtin::tvm_call_packed(), args)));
+    tir::Stmt body = tir::SeqStmt(func_call_stmts);
+    stmts_.push_back(body);
+  }
+
+  /*!
+   * brief Copy a variable to the output. This function is mainly used in edge cases
+   * when we want to return an input or a parameter.
+   */
+  void copy_to_output(te::Var out, te::Var in, size_t size) {
+    auto retval_get = tvm::tir::Call(DataType::Handle(), tvm::tir::builtin::tvm_struct_get(),
+                                     {in, 0, tir::builtin::kArrData});
+
+    // Define intermediate DLTensor to load/store the data
+    auto tmp0 = te::Var("tmp0", DataType::Handle());
+    auto tmp1 = te::Var("tmp1", DataType::Handle());
+    te::Var loop_idx("i", DataType::Int(32));
+    auto retval_i = tir::Load(DataType::UInt(8), tmp0, loop_idx, tir::const_true());
+    auto tostore = tvm::tir::Call(DataType::Handle(), tvm::tir::builtin::tvm_struct_get(),
+                                  {out, 0, tir::builtin::kArrData});
+
+    // Copy the variable from the input to the output
+    tir::Stmt copy = tir::For(
+        loop_idx, 0, ConstInt32(size), tir::ForKind::kSerial,
+        tir::Store(tmp1, tir::Let(tmp0, retval_get, retval_i), loop_idx, tir::const_true()));
+    stmts_.push_back(tir::LetStmt(tmp1, tostore, copy));
+  }
+
+  /*!
+   * Utility function to string together different arguments
+   */
+  template <typename... Args>
+  std::string make_string(Args const&... args) {
+    std::ostringstream ss;
+    using List = int[];
+    (void)List{0, ((void)(ss << args), 0)...};
+
+    return ss.str();
+  }
+
+  void VisitExpr_(const CallNode* op) override {
+    // Descend the call tree
+    for (auto arg : op->args) {
+      VisitExpr(arg);
+    }
+
+    Expr expr = GetRef<Expr>(op);
+    Function func;
+    if (op->op.as<OpNode>()) {
+      LOG(FATAL) << "Operators should be transformed away; try applying"
+                 << "the fuse_ops transformation to the expression.";
+    } else if (op->op.as<GlobalVarNode>()) {
+      LOG(FATAL) << "Not implemented";
+    } else if (op->op.as<FunctionNode>()) {
+      func = GetRef<Function>(op->op.as<FunctionNode>());
+    } else {
+      LOG(FATAL) << "TVM runtime does not support calls to " << op->op->GetTypeKey();
+    }
+    if (!func->HasNonzeroAttr(attr::kPrimitive)) {
+      LOG(FATAL) << "TVM only support calls to primitive functions "
+                 << "(i.e functions composed of fusable operator invocations)";
+    }
+
+    auto pf0 = GetPackedFunc("relay.backend._make_CCacheKey");
+    auto pf1 = GetPackedFunc("relay.backend._CompileEngineLower");
+    Target target;
+    // Handle external function
+    if (func->GetAttr<String>(attr::kCompiler).defined()) {
+      target = Target("ext_dev");
+      CCacheKey key = (*pf0)(func, target);
+      CachedFunc ext_func = (*pf1)(compile_engine_, key);
+      ICHECK(ext_func.defined()) << "External function is not defined.";
+      UpdateConstants(func, &params_);
+
+      // Generate the TIR function call
+      func_call(GetRef<Call>(op), ext_func->func_name);
+    }
+
+    ICHECK_GE(storage_device_map_.count(expr), 0);
+    auto& device_type = storage_device_map_[expr][1];
+    auto call_dev_type = device_type[0]->value;
+    // Normal Relay Function
+    if (targets_.size() == 1) {
+      // homogeneous execution.
+      const auto& it = targets_.begin();
+      target = (*it).second;
+    } else {
+      // heterogeneous execution.
+      std::string call_dev_name;
+      if (call_dev_type == 0) {
+        call_dev_name = "llvm";
+      } else {
+        call_dev_name = runtime::DeviceName(call_dev_type);
+      }
+      if (targets_.count(call_dev_type) == 0) {
+        LOG(FATAL) << "No target is provided for device " << call_dev_name;
+      }
+      target = targets_[call_dev_type];
+    }
+    CCacheKey key = (*pf0)(func, target);
+    CachedFunc lowered_func = (*pf1)(compile_engine_, key);
+    if (!lowered_funcs_.count(target->str())) {
+      lowered_funcs_[target->str()] = IRModule(Map<GlobalVar, BaseFunc>({}));
+    }
+    lowered_funcs_[target->str()]->Update(lowered_func->funcs);
+
+    // Generate the TIR function call
+    func_call(GetRef<Call>(op), lowered_func->func_name);
+  }
+
+  void VisitExpr_(const VarNode* op) override {
+    Expr expr = GetRef<Expr>(op);
+
+    // If the Var node is an output node we need to copy the content of the variable to the output
+    // It's safe to check the SID here because Var StorageToken are never reallocated
+    Array<IntegerArray> sids = storage_device_map_[expr];
+
+    auto output_iter = std::find(return_sid_.begin(), return_sid_.end(),
+                                 static_cast<int>((sids[0][0].as<IntImmNode>())->value));
+    if (output_iter != return_sid_.end()) {
+      int output_index = std::distance(return_sid_.begin(), output_iter);
+      auto var_expr = find_expr(expr);
+      copy_to_output(main_signature_[input_vars_.size() + output_index], var_expr[0], sids[2][0]);
+    }
+  }
+
+  void VisitExpr_(const ConstantNode* op) override {
+    Expr expr = GetRef<Expr>(op);
+    size_t index = params_.size();
+    std::string name = "p" + std::to_string(index);
+
+    param_storage_ids_[name] = storage_device_map_[expr][0][0]->value;
+    params_[name] = op->data;
+    params_by_expr_.Set(expr, name);
+
+    // If the Constant node is an output node we need to copy the content of the parameter to the
+    // output A Var node can only produce a single output
+    Array<IntegerArray> sids = storage_device_map_[expr];
+    auto output_iter = std::find(return_sid_.begin(), return_sid_.end(),
+                                 static_cast<int>((sids[0][0].as<IntImmNode>())->value));
+    if (output_iter != return_sid_.end()) {
+      int output_index = std::distance(return_sid_.begin(), output_iter);
+      copy_to_output(main_signature_[input_vars_.size() + output_index], pack_param(expr),
+                     sids[2][0]);
+    }
+  }
+
+  void VisitExpr_(const TupleNode* op) override {
+    for (auto field : op->fields) {
+      VisitExpr(field);
+    }
+  }
+
+  void VisitExpr_(const LetNode* op) override {
+    // TODO(giuseros): support Let nodes in AOT
+    CHECK(false) << "Let not yet implemented in AOT";
+  }
+  void VisitExpr_(const TupleGetItemNode* op) override { VisitExpr(op->tuple); }
+  void VisitExpr_(const OpNode* op) override {
+    throw std::runtime_error("can not compile op in non-eta expanded form");
+  }
+  void VisitExpr_(const GlobalVarNode* op) override { throw std::runtime_error(""); }
+  void VisitExpr_(const IfNode* op) override { throw std::invalid_argument("if not supported"); }
+  void VisitExpr_(const FunctionNode* op) override {
+    ICHECK(op->GetAttr<String>(attr::kCompiler).defined())
+        << "Only functions supported by custom codegen";
+  }
+  void VisitExpr_(const RefCreateNode* op) override {
+    throw std::invalid_argument("reference not supported");
+  }
+  void VisitExpr_(const RefReadNode* op) override {
+    throw std::invalid_argument("reference not supported");
+  }
+  void VisitExpr_(const RefWriteNode* op) override {
+    throw std::invalid_argument("reference not supported");
+  }
+  void VisitExpr_(const ConstructorNode* op) override {
+    throw std::invalid_argument("ADT constructor case not yet implemented");
+  }
+  void VisitExpr_(const MatchNode* op) override {
+    throw std::invalid_argument("match case not yet implemented");
+  }
+
+  // Create the main PrimFunc to execute the graph
+  tir::PrimFunc CreateMainFunc(unsigned int relay_params) {
+    tir::Stmt body = tir::SeqStmt(stmts_);
+
+    // Allocate the sids
+    std::unordered_map<int, bool> allocated;
+
+    for (auto kv : storage_device_map_) {
+      // Only allocate sids that are needed
+      const bool is_input =
+          (std::find(input_vars_.begin(), input_vars_.end(), kv.first) != input_vars_.end());
+      const bool is_param = (params_by_expr_.find(kv.first) != params_by_expr_.end());
+      if (is_input || is_param) {
+        continue;
+      }
+
+      for (unsigned int i = 0; i < kv.second[0].size(); i++) {
+        int size = kv.second[2][i];
+        int sid = static_cast<int>((kv.second[0][i].as<IntImmNode>())->value);
+
+        if (std::find(return_sid_.begin(), return_sid_.end(), sid) != return_sid_.end()) {
+          continue;
+        }
+
+        // TODO(giuseros): we should allocate this one time outside the PrimFunc
+        // so we dont' pay the price of allocation for every inference
+        if (!allocated[sid]) {
+          body = tir::LetStmt(sids_table_[sid], AllocateBackendMemory(size), body);
+        }
+        allocated[sid] = true;
+      }
+    }
+
+    // Define the attributes
+    body = tir::AttrStmt(PrimExpr(), tvm::tir::attr::device_type, 1, body);
+    body = tir::AttrStmt(PrimExpr(), tvm::tir::attr::device_id, 0, body);
+
+    // Make the PrimFunc
+    return tir::PrimFunc(main_signature_, body, VoidType(), Map<tir::Var, tir::Buffer>(),
+                         DictAttrs(dict_attrs_));
+  }
+
+ protected:
+  /*! \brief mod */
+  runtime::Module* mod_;
+  /*! \brief list of input expressions (i.e., variable passed by the user) */
+  std::vector<Expr> input_vars_;
+  /*! \brief input and output variables belonging to the main function signature */
+  Array<tir::Var> main_signature_;
+  /*! \brief target device */
+  TargetsMap targets_;
+  /*! \brief target host */
+  Target target_host_;
+  /*! PrimFunc attributes */
+  Map<String, ObjectRef> dict_attrs_;
+
+  /*!
+   * \brief parameters (i.e. ConstantNodes found in the graph).
+   * These are take as inputs to the GraphRuntime.
+   * Maps param name to a pair of storage_id and NDArray. At runtime, the storage_id can be
+   * used to lookup the parameter.
+   */
+  std::unordered_map<std::string, runtime::NDArray> params_;
+  /*! \brief mapping between expression and parameters */
+  Map<Expr, String> params_by_expr_;
+  /*! \brief mapping between parameter names ("p0", "p1", etc..) and storage identifiers*/
+  std::unordered_map<std::string, int64_t> param_storage_ids_;
+
+  /*! \brief plan memory of device result */
+  Map<Expr, Array<IntegerArray>> storage_device_map_;
+  std::unordered_map<int, te::Var> sids_table_;
+  /*! \brief lowered funcs */
+  std::unordered_map<std::string, IRModule> lowered_funcs_;
+  /*! \brief name map */
+  std::unordered_map<std::string, size_t> name_map_;
+  /*! \brief compile engine */
+  CompileEngine compile_engine_;
+  /*! \brief GraphPlanMemory module */
+  runtime::Module graph_plan_memory_module_;

Review comment:
       is this used?

##########
File path: src/runtime/crt/memory/stack_allocator.c
##########
@@ -0,0 +1,48 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+// LINT_C_FILE
+
+#include <tvm/runtime/crt/stack_allocator.h>
+
+void* StackMemoryManager_Allocate(tvm_workspace_t* tvm_runtime_workspace, int32_t nbytes) {
+  uint32_t offset_bytes = (~nbytes + 1) & (TVM_RUNTIME_ALLOC_ALIGNMENT_BYTES - 1);
+  uint8_t* current_alloc = tvm_runtime_workspace->next_alloc;
+  uint8_t* next_alloc = tvm_runtime_workspace->next_alloc + nbytes + offset_bytes;
+  uint8_t* workspace_end = tvm_runtime_workspace->workspace + tvm_runtime_workspace->workspace_size;
+
+  if (next_alloc > workspace_end) {
+    return NULL;
+  }
+
+  tvm_runtime_workspace->next_alloc = next_alloc;
+  return current_alloc;
+}
+
+tvm_crt_error_t StackMemoryManager_Free(tvm_workspace_t* tvm_runtime_workspace, void* ptr) {
+  tvm_runtime_workspace->next_alloc = ptr;

Review comment:
       should assert here that ptr is truly FIFO

##########
File path: src/target/source/codegen_c_host.cc
##########
@@ -211,21 +214,34 @@ void CodeGenCHost::PrintGetFuncFromBackend(const std::string& func_name,
   this->stream << "}\n";
 }
 
-void CodeGenCHost::PrintFuncCall(const std::string& packed_func_name, int num_args) {
+void CodeGenCHost::PrintFuncCall(const std::string& packed_func_name, PrimExpr values,
+                                 int num_args) {
   this->PrintIndent();
+  std::string stack_value = "stack_value";
+  if (const VarNode* stack_value_var = values.as<VarNode>()) {
+    stack_value = stack_value_var->name_hint;
+  }
   std::string ret_val = GetUniqueName("ret_val");
   std::string ret_type_code = GetUniqueName("ret_type_code");
   this->stream << "TVMValue " << ret_val << ";\n";
   this->PrintIndent();
   this->stream << "int " << ret_type_code << ";\n";
   this->PrintIndent();
-  this->stream << "if (TVMFuncCall(" << packed_func_name << ", "
-               << "(TVMValue*) stack_value"
-               << ", "
+
+  if (is_aot_executor_) {

Review comment:
       i'd say that for this PR, we should either:
   1. revert this and just use TVMFuncCall for now in AOT
   2. implement the new TIR node and make this logic dependent on the type of TIR being codegen'd.
   
   I don't think we should merge as-is.

##########
File path: src/runtime/crt/memory/stack_allocator.c
##########
@@ -0,0 +1,48 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+// LINT_C_FILE
+
+#include <tvm/runtime/crt/stack_allocator.h>
+
+void* StackMemoryManager_Allocate(tvm_workspace_t* tvm_runtime_workspace, int32_t nbytes) {
+  uint32_t offset_bytes = (~nbytes + 1) & (TVM_RUNTIME_ALLOC_ALIGNMENT_BYTES - 1);

Review comment:
       does it make sense to store some tag e.g. just before or after the returned ptr, to help with the assert in Free?

##########
File path: src/target/source/codegen_c_host.cc
##########
@@ -324,15 +343,20 @@ inline void CodeGenCHost::PrintTernaryCondExpr(const T* op, const char* compare,
 }
 
 runtime::Module BuildCHost(IRModule mod, Target target) {
+  bool is_aot_executor =

Review comment:
       I don't think this should be needed in CodeGenCHost, perhaps the exception being the function ordering. we could also just print prototypes, too.

##########
File path: src/target/source/codegen_c_host.cc
##########
@@ -274,8 +290,11 @@ void CodeGenCHost::VisitExpr_(const CallNode* op, std::ostream& os) {  // NOLINT
           << "Expected name " << packed_func_name << " to not be taken";
       decl_stream << "static void* " << packed_func_name << " = NULL;\n";
     }
-    this->PrintGetFuncFromBackend(func_name, packed_func_name);
-    this->PrintFuncCall(packed_func_name, num_args);
+    if (!is_aot_executor_) {

Review comment:
       same thing here--should either implement the new tir node or revert for now

##########
File path: tests/python/unittest/test_crt.py
##########
@@ -157,8 +157,8 @@ def @main(%a : Tensor[(1, 2), uint8], %b : Tensor[(1, 2), uint8]) {
         factory = tvm.relay.build(relay_mod, target=TARGET)
 
     with _make_session(workspace, factory.get_lib()) as sess:
-        graph_mod = tvm.micro.create_local_graph_executor(
-            factory.get_json(), sess.get_system_lib(), sess.device
+        graph_mod = tvm.micro.create_local_graph_runtime(

Review comment:
       revert the first line here

##########
File path: tests/python/relay/test_backend_graph_executor.py
##########
@@ -133,7 +133,7 @@ def test_plan_memory():
     storage_ids = set()
     device_types = set()
     for k, v in smap.items():
-        assert len(v) == 2
+        assert len(v) == 3

Review comment:
       want to assert on the v[2] element?

##########
File path: tests/python/relay/aot/infra.py
##########
@@ -0,0 +1,226 @@
+# Licensed to the Apache Software Foundation (ASF) under one

Review comment:
       might propose to name this aot_test_util.py

##########
File path: include/tvm/target/target_kind.h
##########
@@ -140,6 +140,12 @@ static constexpr const char* kTvmRuntimeCpp = "c++";
 /*! \brief Value used with --runtime in target specs to indicate the C runtime. */
 static constexpr const char* kTvmRuntimeCrt = "c";
 
+/*! \brief Value used with --executor in target specs to indicate the graph executor. */
+static constexpr const char* kTvmExecutorGraph = "graph";

Review comment:
       we discussed this a bit offilne; documenting here. my main question here was how we should handle flags like `--executor` and `--runtime` in Target string; these don't really influence generated operator implementations and therefore may not belong in autotune logs. we're not near consensus here, but general thinking was that perhaps a second configuration should be passed to `tvm.relay.build` containing e.g. relay compiler configuration.

##########
File path: src/relay/backend/aot_codegen.cc
##########
@@ -0,0 +1,675 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+/*!
+ * \file relay/backend/graph_codegen.cc
+ * \brief Graph runtime codegen
+ */
+
+#include <dmlc/any.h>
+#include <tvm/ir/module.h>
+#include <tvm/relay/expr_functor.h>
+#include <tvm/runtime/device_api.h>
+#include <tvm/tir/builtin.h>
+#include <tvm/tir/expr.h>
+#include <tvm/tir/stmt.h>
+
+#include <algorithm>
+#include <list>
+#include <string>
+#include <vector>
+
+#include "../../runtime/meta_data.h"
+#include "compile_engine.h"
+#include "utils.h"
+
+namespace tvm {
+namespace relay {
+namespace backend {
+
+using IntegerArray = Array<Integer>;
+using ShapeVector = std::vector<std::vector<int64_t>>;
+using GraphAttrs = std::unordered_map<std::string, dmlc::any>;
+using TargetsMap = std::unordered_map<int, Target>;
+
+/*! \brief Lowered outputs */
+struct AOTLoweredOutput {
+  std::string graph_tir;
+  Map<String, IRModule> lowered_funcs;
+  Array<tvm::runtime::Module> external_mods;
+  std::unordered_map<std::string, std::pair<int, const tvm::runtime::NDArray>> params;
+  runtime::AOTMetadata aot_metadata;
+};
+
+class AotReturnSidVisitor : public ExprVisitor {
+ public:
+  explicit AotReturnSidVisitor(Map<Expr, Array<IntegerArray>> storage_device_map)
+      : storage_device_map_{storage_device_map}, return_sid_{-1} {}
+
+  IntegerArray FindReturnSid(Function func) {
+    VisitExpr(func->body);
+    return return_sid_;
+  }
+
+ protected:
+  void AssignReturnSid(Expr e) {
+    auto iter = storage_device_map_.find(e);
+    if (iter != storage_device_map_.end()) {
+      return_sid_ = (*iter).second[0];
+    }
+  }
+
+  void VisitExpr_(const ConstantNode* cn) override {
+    ExprVisitor::VisitExpr_(cn);
+    AssignReturnSid(GetRef<Expr>(cn));
+  }
+
+  void VisitExpr_(const VarNode* vn) override {
+    ExprVisitor::VisitExpr_(vn);
+    AssignReturnSid(GetRef<Expr>(vn));
+  }
+
+  void VisitExpr_(const CallNode* cn) override {
+    ExprVisitor::VisitExpr_(cn);
+    AssignReturnSid(GetRef<Expr>(cn));
+  }
+
+  void VisitExpr_(const LetNode* op) override { VisitExpr(op->body); }
+
+  void VisitExpr_(const TupleNode* tn) override {
+    ExprVisitor::VisitExpr_(tn);
+    AssignReturnSid(GetRef<Expr>(tn));
+  }
+
+ private:
+  Map<Expr, Array<IntegerArray>> storage_device_map_;
+  IntegerArray return_sid_;
+};
+
+using TIRNetwork = tvm::Array<tir::Stmt>;
+
+/*! \brief Code generator for graph runtime */
+class AOTCodegen : public ExprVisitor {
+ protected:
+  /*!
+   * \brief Utility function to allocate a DLTensor or TVMValue
+   * \param  type the type of allocation
+   * \param num the number of variable to allocate on the stack
+   * \return PrimExpr representing the allocated object
+   */
+  PrimExpr StackAlloca(std::string type, size_t num) {
+    Array<PrimExpr> args = {tir::StringImm(type), ConstInt32(num)};
+    return tir::Call(DataType::Handle(), tir::builtin::tvm_stack_alloca(), args);
+  }
+
+  /*!
+   * \brief Utility function to allocate memory for storage identifiers
+   * \param  memory_size_byte size in bytes of the allocation
+   * \return PrimExpr representing the allocated memory
+   */
+  PrimExpr AllocateBackendMemory(int memory_size_byte) {
+    // TODO(giuseros): use tir::Allocate instead of TVMBackendAllocWorkspace
+    // to enable unified memory planning
+    static const Op& op = Op::Get("tir.TVMBackendAllocWorkspace");
+    return tvm::tir::Call(DataType::Handle(), op, {1, 0, memory_size_byte, 2, 8});
+  }
+
+  /*!
+   * \brief Utility function to convert a concrete integer to a PrimExpr.
+   * \param num the number to convert
+   * \return PrimExpr representing num
+   */
+  inline PrimExpr ConstInt32(size_t num) {
+    ICHECK_LE(num, std::numeric_limits<int>::max());
+    return tir::make_const(DataType::Int(32), static_cast<int>(num));
+  }
+
+  /*!
+   * \brief Return a vector of variables that represents the sids for the given Relay Expr
+   */
+  std::vector<tir::Var> pack_sid(Expr expr) {
+    Array<IntegerArray> sids = storage_device_map_[expr];
+    std::vector<tir::Var> sid_vars;
+
+    // Note that an expression can have multiple sids associated with it
+    // e.g., returning multiple values from a function
+    for (const auto& sid : sids[0]) {
+      // Determine if an sid is an output buffer
+      int sid_int = static_cast<int>((sid.as<IntImmNode>())->value);
+      auto output_iter = std::find(return_sid_.begin(), return_sid_.end(), sid_int);
+      if (output_iter != return_sid_.end()) {
+        int output_index = std::distance(return_sid_.begin(), output_iter);
+        sid_vars.push_back(main_signature_[input_vars_.size() + output_index]);
+        continue;
+      }
+      // Pack the sid inside the TVMValue
+      auto sid_array = te::Var(make_string("sid_", sid, "_value"), DataType::Handle());
+      auto sid_value = sids_table_[sid];
+      tvm::PrimExpr set_tensor =
+          tvm::tir::Call(DataType::Handle(), tvm::tir::builtin::tvm_struct_set(),
+                         {sid_array, 0, tir::builtin::kArrData, sid_value});
+      stmts_.push_back(tir::LetStmt(sid_array, StackAlloca("array", 1), tir::Evaluate(set_tensor)));
+      sid_vars.push_back(sid_array);
+    }
+    return sid_vars;
+  }
+
+  /*!
+   * \brief Utility function to return a parameter associated with an expression
+   * \param expr Relay Expression assicated with the parameter
+   * \return Variable that represents the DLTensor associated with the parameters
+   */
+  tir::Var pack_param(Expr expr) {
+    // TODO(giuseros): Using call_extern to call into lookup_linked_param. This is because the
+    // builtin::ret is not supported yet in the c target. Once return is supported we can use
+    // tvm_call_packed_lowered().
+    int param_sid = param_storage_ids_[reverse_params_lookup_[expr]];
+    auto lookup_linked_param_fn = tir::StringImm(::tvm::runtime::symbol::tvm_lookup_linked_param);
+    auto param_array = te::Var(make_string("param_", param_sid, "_array"), DataType::Handle());
+
+    // Compose the lookup_call using a local stack
+    Array<tir::Stmt> lookup_call;
+    auto param_var = te::Var(make_string("param_", param_sid, "_value"), DataType::Handle());
+    auto ret_var = te::Var("ret_value", DataType::Handle());
+    auto ret_code = te::Var("ret_value", DataType::Handle());
+
+    lookup_call.push_back(tir::Evaluate(
+        tvm::tir::Call(DataType::Handle(), tvm::tir::builtin::tvm_struct_set(),
+                       {param_var, 0, tir::builtin::kTVMValueContent, ConstInt32(param_sid)})));
+    lookup_call.push_back(tir::Evaluate(
+        tvm::tir::Call(DataType::Handle(), tir::builtin::call_extern(),
+                       {lookup_linked_param_fn, param_var, 0, 0, ret_var, ret_code, 0})));
+    auto ret_var_handle = tvm::tir::Call(DataType::Handle(), tvm::tir::builtin::tvm_struct_get(),
+                                         {ret_var, 0, tir::builtin::kTVMValueContent});
+
+    // Set the param to the value returned by lookup_call
+    tvm::PrimExpr set_param_array =
+        tvm::tir::Call(DataType::Handle(), tvm::tir::builtin::tvm_struct_set(),
+                       {param_array, 0, tir::builtin::kArrData, ret_var_handle});
+    lookup_call.push_back(tir::Evaluate(set_param_array));
+
+    tir::Stmt lookup_body = tir::SeqStmt(lookup_call);
+
+    // Allocate the DLTensors on the stack
+    lookup_body = tir::LetStmt(param_var, StackAlloca("arg_value", 1), lookup_body);
+    lookup_body = tir::LetStmt(ret_var, StackAlloca("arg_value", 1), lookup_body);
+    lookup_body = tir::LetStmt(ret_code, StackAlloca("arg_value", 1), lookup_body);
+    lookup_body = tir::LetStmt(param_array, StackAlloca("arg_value", 1), lookup_body);
+    stmts_.push_back(lookup_body);
+    return param_array;
+  }
+
+  /*!
+   * brief Given an expression return the variable(s) associated with that expression
+   */
+  std::vector<te::Var> find_expr(Expr arg) {
+    auto input_iter = std::find(input_vars_.begin(), input_vars_.end(), arg);
+    if (input_iter != input_vars_.end()) {
+      // Input variable
+      int main_index = std::distance(input_vars_.begin(), input_iter);
+      return {main_signature_[main_index]};
+    } else if (reverse_params_lookup_.find(arg) != reverse_params_lookup_.end()) {
+      // Parameter of the network
+      return {pack_param(arg)};
+    } else {
+      // Storage identifier (i.e., intermediate memory)
+      return pack_sid(arg);
+    }
+  }
+
+  /*!
+   * brief Call a function with a given name
+   */
+  void func_call(Call call, std::string func_name) {
+    tvm::Array<PrimExpr> args{tvm::tir::StringImm(func_name)};
+    std::vector<tir::Stmt> func_call_stmts;
+
+    // Pack the inputs
+    for (Expr arg : call->args) {
+      auto var_arg = find_expr(arg);
+      args.push_back(var_arg[0]);
+    }
+
+    auto ret_expr = Downcast<Expr>(call);
+
+    // Pack the return(s) value. A call node can produce multiple outputs
+    for (const auto& var : pack_sid(ret_expr)) {
+      args.push_back(var);
+    }
+
+    // Use tvm_call_packed to execute the function
+    func_call_stmts.push_back(tir::Evaluate(
+        tvm::tir::Call(DataType::Int(32), tvm::tir::builtin::tvm_call_packed(), args)));
+    tir::Stmt body = tir::SeqStmt(func_call_stmts);
+    stmts_.push_back(body);
+  }
+
+  /*!
+   * brief Copy a variable to the output. This function is mainly used in edge cases

Review comment:
       ok that makes sense. I'm good with a TODO, the main goal of this PR should just be parity with GraphExecutor.

##########
File path: tests/python/unittest/test_runtime_module_based_interface.py
##########
@@ -526,11 +526,11 @@ def test_debug_graph_executor():
     out = get_output(0).asnumpy()
     tvm.testing.assert_allclose(out, verify(data), atol=1e-5)
 
-    # debug graph executor wrapper
-    debug_g_mod = debug_executor.GraphModuleDebug(
-        complied_graph_lib["debug_create"]("default", dev),
-        [dev],
-        complied_graph_lib.get_json(),
+    # debug graph runtime wrapper
+    debug_g_mod = debug_runtime.GraphModuleDebug(

Review comment:
       at least the runtime part




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] manupa-arm commented on a change in pull request #7785: [AOT] Introducing AOT in TVM

Posted by GitBox <gi...@apache.org>.

manupa-arm commented on a change in pull request #7785:
URL: https://github.com/apache/tvm/pull/7785#discussion_r620080936



##########
File path: src/runtime/crt/memory/stack_allocator.c
##########
@@ -16,17 +16,22 @@
  * specific language governing permissions and limitations
  * under the License.
  */
-
 // LINT_C_FILE
-
 #include <tvm/runtime/crt/stack_allocator.h>
+#ifdef TVM_CRT_DEBUG
+#include <tvm/runtime/crt/logging.h>
+#endif
 
 void* StackMemoryManager_Allocate(tvm_workspace_t* tvm_runtime_workspace, int32_t nbytes) {
   uint32_t offset_bytes = (~nbytes + 1) & (TVM_RUNTIME_ALLOC_ALIGNMENT_BYTES - 1);
   uint8_t* current_alloc = tvm_runtime_workspace->next_alloc;
   uint8_t* next_alloc = tvm_runtime_workspace->next_alloc + nbytes + offset_bytes;
   uint8_t* workspace_end = tvm_runtime_workspace->workspace + tvm_runtime_workspace->workspace_size;
-
+#ifdef TVM_CRT_DEBUG

Review comment:
       Cool! I agree that alignment should be considered in a higher-level memory planning work. :). 
   However, as shown here, its trivial to support a global alignment (i.e. addresses produced by the allocator is aligned to a global granularity) might be beneficial in the mean time.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] giuseros commented on a change in pull request #7785: [AOT] Introducing AOT in TVM

Posted by GitBox <gi...@apache.org>.

giuseros commented on a change in pull request #7785:
URL: https://github.com/apache/tvm/pull/7785#discussion_r619541593



##########
File path: src/runtime/crt/memory/stack_allocator.c
##########
@@ -16,17 +16,22 @@
  * specific language governing permissions and limitations
  * under the License.
  */
-
 // LINT_C_FILE
-
 #include <tvm/runtime/crt/stack_allocator.h>
+#ifdef TVM_CRT_DEBUG
+#include <tvm/runtime/crt/logging.h>
+#endif
 
 void* StackMemoryManager_Allocate(tvm_workspace_t* tvm_runtime_workspace, int32_t nbytes) {
   uint32_t offset_bytes = (~nbytes + 1) & (TVM_RUNTIME_ALLOC_ALIGNMENT_BYTES - 1);
   uint8_t* current_alloc = tvm_runtime_workspace->next_alloc;
   uint8_t* next_alloc = tvm_runtime_workspace->next_alloc + nbytes + offset_bytes;
   uint8_t* workspace_end = tvm_runtime_workspace->workspace + tvm_runtime_workspace->workspace_size;
-
+#ifdef TVM_CRT_DEBUG

Review comment:
       I am not following the last comment. If you store X bytes of memory, you allocate a block of [X/16]*16 bytes. Then you *additionally* allocate 4 bytes for the tag. So I think there are two issues:
   
   * Additional 4 bytes of memory per block allocated
   * Loss of memory alignment. This can really hurt accelerators that usually have strong alignment requirements. 
   
   For these reasons I would leave it off by default, and only enable it for debugging. If we want to add a feature to enforce the FIFO ordering  in production I would leave it  for a subsequent and more focused PR 
   
   About the tests, I  tried to follow your suggestion to have a local `crt_config.h`, but the `stack_allocator.c` is compiled in `libmemory.a` that includes the default `crt_config.h`. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] giuseros commented on a change in pull request #7785: [AOT] Introducing AOT in TVM

Posted by GitBox <gi...@apache.org>.

giuseros commented on a change in pull request #7785:
URL: https://github.com/apache/tvm/pull/7785#discussion_r610707236



##########
File path: include/tvm/runtime/crt/aot/tvm_executor.h
##########
@@ -0,0 +1,97 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+/*!
+ * \file include/tvm/runtime/crt/aot/tvm_executor.h
+ * \brief TVM Executor for the Ahead-of-Time Runtime
+ *
+ * AOT models are described by the TVM model descriptor format
+ * which can be passed to tvm_runtime_run. These descriptors will be
+ * generated by the AOT compilation process. This can optionally be
+ * augmented with platform specific context to be passed to the TVM
+ * operators.
+ *
+ * Example:

Review comment:
       So, I am not sure that coming up with a common graph_executor/aot_executor API is beneficial here. In aot the interface is very tiny, in theory it might even be unnecessary. The graph executor presents the burden of parsing the json etc... so it would require a more sophisticated interface. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] giuseros commented on a change in pull request #7785: [AOT] Introducing AOT in TVM

Posted by GitBox <gi...@apache.org>.

giuseros commented on a change in pull request #7785:
URL: https://github.com/apache/tvm/pull/7785#discussion_r617752937



##########
File path: src/relay/backend/build_module.cc
##########
@@ -473,23 +517,25 @@ class RelayBuildModule : public runtime::ModuleNode {
 
     // Relay IRModule -> IRModule optimizations.
     relay_module = Optimize(relay_module, targets_, params);
+
     // Get the updated function.
     auto func = Downcast<Function>(relay_module->Lookup("main"));
 
     // Generate code for the updated function.
-    graph_codegen_ = std::unique_ptr<GraphCodegen>(new GraphCodegen());
-    graph_codegen_->Init(nullptr, targets_);
-    graph_codegen_->Codegen(func);
-
-    ret_.graph_json = graph_codegen_->GetJSON();
-    ret_.params = graph_codegen_->GetParams();
+    const String executor_str =
+        target_host->GetAttr<String>("executor").value_or(kTvmExecutorGraph);

Review comment:
       Ups, I didn't update the page, and didn't see the comments :) @areusch I was seeing a failure in a gpu test. The reason was the following:
   * In order to understand  what `FactoryModule` I need to use, I query the `get_executor_type()` method of the `BuildModule` c++ object. 
   * `get_executor_type`  is querying the `target_host`, and then does:
   `executor_str = target_host->GetAttr<String>("executor").value_or(kTvmExecutorGraph)`
   * In case of `target = "cuda"` the `target_host` is not defined and it was crashing. 
   Two possible solutions:
   * I can work around that by checking for `target_host.defined()` and default to "graph". 
   * I can follow what Manupa suggests and move the `executor` as a build parameter
   As I said, the change is not invasive, because the only place where I need the executor is within `build_module.cc`. Please, have a look and let me know if you prefer this or revert to a target option. 
   
   My opinion is that, since it looks like not an invasive change, we can have it as a build parameter in this PR. What do you think?
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] giuseros commented on pull request #7785: [AOT] Introducing AOT in TVM

Posted by GitBox <gi...@apache.org>.

giuseros commented on pull request #7785:
URL: https://github.com/apache/tvm/pull/7785#issuecomment-816755729


   Hi all, 
   I refactored bits and pieces. In particular:
   * Moved everything in crt (with the agreement that in the next future we will remove dependencies on things that embedded platforms might struggle with)
   * Refactored the `model_library_format.py` and the executor factories to be a bit more generic and elegant. Please, let me know what you think


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] areusch commented on pull request #7785: [AOT] Introducing AOT in TVM

Posted by GitBox <gi...@apache.org>.

areusch commented on pull request #7785:
URL: https://github.com/apache/tvm/pull/7785#issuecomment-832909339


   thanks @giuseros for your hard work on this effort and for bearing with me through a long review! and thanks @manupa-arm @jroesch @tqchen @tkonolige @jwfromm for helping to review the PR!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] giuseros commented on a change in pull request #7785: [AOT] Introducing AOT in TVM

Posted by GitBox <gi...@apache.org>.

giuseros commented on a change in pull request #7785:
URL: https://github.com/apache/tvm/pull/7785#discussion_r608182858



##########
File path: src/relay/backend/aot_codegen.cc
##########
@@ -0,0 +1,675 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+/*!
+ * \file relay/backend/graph_codegen.cc
+ * \brief Graph runtime codegen
+ */
+
+#include <dmlc/any.h>
+#include <tvm/ir/module.h>
+#include <tvm/relay/expr_functor.h>
+#include <tvm/runtime/device_api.h>
+#include <tvm/tir/builtin.h>
+#include <tvm/tir/expr.h>
+#include <tvm/tir/stmt.h>
+
+#include <algorithm>
+#include <list>
+#include <string>
+#include <vector>
+
+#include "../../runtime/meta_data.h"
+#include "compile_engine.h"
+#include "utils.h"
+
+namespace tvm {
+namespace relay {
+namespace backend {
+
+using IntegerArray = Array<Integer>;
+using ShapeVector = std::vector<std::vector<int64_t>>;
+using GraphAttrs = std::unordered_map<std::string, dmlc::any>;
+using TargetsMap = std::unordered_map<int, Target>;
+
+/*! \brief Lowered outputs */
+struct AOTLoweredOutput {
+  std::string graph_tir;
+  Map<String, IRModule> lowered_funcs;
+  Array<tvm::runtime::Module> external_mods;
+  std::unordered_map<std::string, std::pair<int, const tvm::runtime::NDArray>> params;
+  runtime::AOTMetadata aot_metadata;
+};
+
+class AotReturnSidVisitor : public ExprVisitor {
+ public:
+  explicit AotReturnSidVisitor(Map<Expr, Array<IntegerArray>> storage_device_map)
+      : storage_device_map_{storage_device_map}, return_sid_{-1} {}
+
+  IntegerArray FindReturnSid(Function func) {
+    VisitExpr(func->body);
+    return return_sid_;
+  }
+
+ protected:
+  void AssignReturnSid(Expr e) {
+    auto iter = storage_device_map_.find(e);
+    if (iter != storage_device_map_.end()) {
+      return_sid_ = (*iter).second[0];
+    }
+  }
+
+  void VisitExpr_(const ConstantNode* cn) override {
+    ExprVisitor::VisitExpr_(cn);
+    AssignReturnSid(GetRef<Expr>(cn));
+  }
+
+  void VisitExpr_(const VarNode* vn) override {
+    ExprVisitor::VisitExpr_(vn);
+    AssignReturnSid(GetRef<Expr>(vn));
+  }
+
+  void VisitExpr_(const CallNode* cn) override {
+    ExprVisitor::VisitExpr_(cn);
+    AssignReturnSid(GetRef<Expr>(cn));
+  }
+
+  void VisitExpr_(const LetNode* op) override { VisitExpr(op->body); }
+
+  void VisitExpr_(const TupleNode* tn) override {
+    ExprVisitor::VisitExpr_(tn);
+    AssignReturnSid(GetRef<Expr>(tn));
+  }
+
+ private:
+  Map<Expr, Array<IntegerArray>> storage_device_map_;
+  IntegerArray return_sid_;
+};
+
+using TIRNetwork = tvm::Array<tir::Stmt>;
+
+/*! \brief Code generator for graph runtime */
+class AOTCodegen : public ExprVisitor {
+ protected:
+  /*!
+   * \brief Utility function to allocate a DLTensor or TVMValue
+   * \param  type the type of allocation
+   * \param num the number of variable to allocate on the stack
+   * \return PrimExpr representing the allocated object
+   */
+  PrimExpr StackAlloca(std::string type, size_t num) {
+    Array<PrimExpr> args = {tir::StringImm(type), ConstInt32(num)};
+    return tir::Call(DataType::Handle(), tir::builtin::tvm_stack_alloca(), args);
+  }
+
+  /*!
+   * \brief Utility function to allocate memory for storage identifiers
+   * \param  memory_size_byte size in bytes of the allocation
+   * \return PrimExpr representing the allocated memory
+   */
+  PrimExpr AllocateBackendMemory(int memory_size_byte) {
+    // TODO(giuseros): use tir::Allocate instead of TVMBackendAllocWorkspace
+    // to enable unified memory planning
+    static const Op& op = Op::Get("tir.TVMBackendAllocWorkspace");
+    return tvm::tir::Call(DataType::Handle(), op, {1, 0, memory_size_byte, 2, 8});
+  }
+
+  /*!
+   * \brief Utility function to convert a concrete integer to a PrimExpr.
+   * \param num the number to convert
+   * \return PrimExpr representing num
+   */
+  inline PrimExpr ConstInt32(size_t num) {
+    ICHECK_LE(num, std::numeric_limits<int>::max());
+    return tir::make_const(DataType::Int(32), static_cast<int>(num));
+  }
+
+  /*!
+   * \brief Return a vector of variables that represents the sids for the given Relay Expr
+   */
+  std::vector<tir::Var> pack_sid(Expr expr) {
+    Array<IntegerArray> sids = storage_device_map_[expr];
+    std::vector<tir::Var> sid_vars;
+
+    // Note that an expression can have multiple sids associated with it
+    // e.g., returning multiple values from a function
+    for (const auto& sid : sids[0]) {
+      // Determine if an sid is an output buffer
+      int sid_int = static_cast<int>((sid.as<IntImmNode>())->value);
+      auto output_iter = std::find(return_sid_.begin(), return_sid_.end(), sid_int);
+      if (output_iter != return_sid_.end()) {
+        int output_index = std::distance(return_sid_.begin(), output_iter);
+        sid_vars.push_back(main_signature_[input_vars_.size() + output_index]);
+        continue;
+      }
+      // Pack the sid inside the TVMValue
+      auto sid_array = te::Var(make_string("sid_", sid, "_value"), DataType::Handle());
+      auto sid_value = sids_table_[sid];
+      tvm::PrimExpr set_tensor =
+          tvm::tir::Call(DataType::Handle(), tvm::tir::builtin::tvm_struct_set(),
+                         {sid_array, 0, tir::builtin::kArrData, sid_value});
+      stmts_.push_back(tir::LetStmt(sid_array, StackAlloca("array", 1), tir::Evaluate(set_tensor)));
+      sid_vars.push_back(sid_array);
+    }
+    return sid_vars;
+  }
+
+  /*!
+   * \brief Utility function to return a parameter associated with an expression
+   * \param expr Relay Expression assicated with the parameter
+   * \return Variable that represents the DLTensor associated with the parameters
+   */
+  tir::Var pack_param(Expr expr) {
+    // TODO(giuseros): Using call_extern to call into lookup_linked_param. This is because the
+    // builtin::ret is not supported yet in the c target. Once return is supported we can use
+    // tvm_call_packed_lowered().
+    int param_sid = param_storage_ids_[reverse_params_lookup_[expr]];
+    auto lookup_linked_param_fn = tir::StringImm(::tvm::runtime::symbol::tvm_lookup_linked_param);
+    auto param_array = te::Var(make_string("param_", param_sid, "_array"), DataType::Handle());
+
+    // Compose the lookup_call using a local stack
+    Array<tir::Stmt> lookup_call;
+    auto param_var = te::Var(make_string("param_", param_sid, "_value"), DataType::Handle());
+    auto ret_var = te::Var("ret_value", DataType::Handle());
+    auto ret_code = te::Var("ret_value", DataType::Handle());
+
+    lookup_call.push_back(tir::Evaluate(
+        tvm::tir::Call(DataType::Handle(), tvm::tir::builtin::tvm_struct_set(),
+                       {param_var, 0, tir::builtin::kTVMValueContent, ConstInt32(param_sid)})));
+    lookup_call.push_back(tir::Evaluate(
+        tvm::tir::Call(DataType::Handle(), tir::builtin::call_extern(),
+                       {lookup_linked_param_fn, param_var, 0, 0, ret_var, ret_code, 0})));
+    auto ret_var_handle = tvm::tir::Call(DataType::Handle(), tvm::tir::builtin::tvm_struct_get(),
+                                         {ret_var, 0, tir::builtin::kTVMValueContent});
+
+    // Set the param to the value returned by lookup_call
+    tvm::PrimExpr set_param_array =
+        tvm::tir::Call(DataType::Handle(), tvm::tir::builtin::tvm_struct_set(),
+                       {param_array, 0, tir::builtin::kArrData, ret_var_handle});
+    lookup_call.push_back(tir::Evaluate(set_param_array));
+
+    tir::Stmt lookup_body = tir::SeqStmt(lookup_call);
+
+    // Allocate the DLTensors on the stack
+    lookup_body = tir::LetStmt(param_var, StackAlloca("arg_value", 1), lookup_body);
+    lookup_body = tir::LetStmt(ret_var, StackAlloca("arg_value", 1), lookup_body);
+    lookup_body = tir::LetStmt(ret_code, StackAlloca("arg_value", 1), lookup_body);
+    lookup_body = tir::LetStmt(param_array, StackAlloca("arg_value", 1), lookup_body);
+    stmts_.push_back(lookup_body);
+    return param_array;
+  }
+
+  /*!
+   * brief Given an expression return the variable(s) associated with that expression
+   */
+  std::vector<te::Var> find_expr(Expr arg) {
+    auto input_iter = std::find(input_vars_.begin(), input_vars_.end(), arg);
+    if (input_iter != input_vars_.end()) {
+      // Input variable
+      int main_index = std::distance(input_vars_.begin(), input_iter);
+      return {main_signature_[main_index]};
+    } else if (reverse_params_lookup_.find(arg) != reverse_params_lookup_.end()) {
+      // Parameter of the network
+      return {pack_param(arg)};
+    } else {
+      // Storage identifier (i.e., intermediate memory)
+      return pack_sid(arg);
+    }
+  }
+
+  /*!
+   * brief Call a function with a given name
+   */
+  void func_call(Call call, std::string func_name) {
+    tvm::Array<PrimExpr> args{tvm::tir::StringImm(func_name)};
+    std::vector<tir::Stmt> func_call_stmts;
+
+    // Pack the inputs
+    for (Expr arg : call->args) {
+      auto var_arg = find_expr(arg);
+      args.push_back(var_arg[0]);
+    }
+
+    auto ret_expr = Downcast<Expr>(call);
+
+    // Pack the return(s) value. A call node can produce multiple outputs
+    for (const auto& var : pack_sid(ret_expr)) {
+      args.push_back(var);
+    }
+
+    // Use tvm_call_packed to execute the function
+    func_call_stmts.push_back(tir::Evaluate(
+        tvm::tir::Call(DataType::Int(32), tvm::tir::builtin::tvm_call_packed(), args)));
+    tir::Stmt body = tir::SeqStmt(func_call_stmts);
+    stmts_.push_back(body);
+  }
+
+  /*!
+   * brief Copy a variable to the output. This function is mainly used in edge cases

Review comment:
       In a copy-on-write fashion? Yes, that could be nice, but my fear is that returning an input (i.e., identity function)  is a bit of an edge case. If there is a concrete use case I am happy to add a TODO here




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] giuseros commented on a change in pull request #7785: [AOT] Introducing AOT in TVM

Posted by GitBox <gi...@apache.org>.

giuseros commented on a change in pull request #7785:
URL: https://github.com/apache/tvm/pull/7785#discussion_r611808333



##########
File path: include/tvm/runtime/crt/aot_executor.h
##########
@@ -68,6 +67,9 @@ extern "C" {
 typedef struct {
 } tvm_context_t;
 
+typedef int32_t(tvm_function_t)(void* args, void* arg_type_ids, int32_t num_args,

Review comment:
       I am happy with this for now. Just be aware that we are working to change this runner signature to be unpacked  (so to be more embedded friendly)




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] giuseros commented on a change in pull request #7785: [AOT] Introducing AOT in TVM

Posted by GitBox <gi...@apache.org>.

giuseros commented on a change in pull request #7785:
URL: https://github.com/apache/tvm/pull/7785#discussion_r615043798



##########
File path: tests/python/relay/test_backend_graph_executor.py
##########
@@ -133,7 +133,7 @@ def test_plan_memory():
     storage_ids = set()
     device_types = set()
     for k, v in smap.items():
-        assert len(v) == 2
+        assert len(v) == 3

Review comment:
       Also here, I am not sure exactly what you mean 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] giuseros commented on a change in pull request #7785: [AOT] Introducing AOT in TVM

Posted by GitBox <gi...@apache.org>.

giuseros commented on a change in pull request #7785:
URL: https://github.com/apache/tvm/pull/7785#discussion_r611842671



##########
File path: src/runtime/crt/common/aot_backend_api.c
##########
@@ -0,0 +1,59 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+// LINT_C_FILE
+
+#include <assert.h>
+#include <inttypes.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+#include <tvm/runtime/c_backend_api.h>
+#include <tvm/runtime/crt/error_codes.h>
+#include <tvm/runtime/crt/logging.h>
+#include <tvm/runtime/crt/platform.h>
+
+#include "crt_config.h"
+
+void* TVMBackendAllocWorkspace(int device_type, int device_id, uint64_t nbytes, int dtype_code_hint,

Review comment:
       My main issue is about `TVMBackendRegisterSystemLibSymbol`. This is a `TVMBackend` API, and I feel it should stay in crt_backend. The point is that aot backend and crt backend are different, and that's why I created this file. Are you proposing to put `TVMBackendRegisterSystemLibSymbol` somewhere in `crt/runtime`? Anyway, this seems a minor point. Let's agree on a proposal that avoid the platform to define a fake `TVMBackendRegisterSystemLibSymbol` since this won't be used by AOT. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] tqchen commented on a change in pull request #7785: [AOT] Introducing AOT in TVM

Posted by GitBox <gi...@apache.org>.

tqchen commented on a change in pull request #7785:
URL: https://github.com/apache/tvm/pull/7785#discussion_r611859910



##########
File path: src/target/source/codegen_c_host.cc
##########
@@ -40,13 +40,16 @@ namespace codegen {
 
 CodeGenCHost::CodeGenCHost() { module_name_ = GetUniqueName("__tvm_module_ctx"); }
 
-void CodeGenCHost::Init(bool output_ssa, bool emit_asserts, std::string target_str) {
+void CodeGenCHost::Init(bool output_ssa, bool emit_asserts, bool is_aot_executor,
+                        std::string target_str) {

Review comment:
       Would be great to clarify the total number of changes being needed in AOT executor. Ideally the fact of AOT executor should not impact the code generator. Instead we should have generic arguments e.g. a list of symbols being exposed etc.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] giuseros commented on a change in pull request #7785: [AOT] Introducing AOT in TVM

Posted by GitBox <gi...@apache.org>.

giuseros commented on a change in pull request #7785:
URL: https://github.com/apache/tvm/pull/7785#discussion_r608181342



##########
File path: src/relay/backend/aot_codegen.cc
##########
@@ -0,0 +1,675 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+/*!
+ * \file relay/backend/graph_codegen.cc
+ * \brief Graph runtime codegen
+ */
+
+#include <dmlc/any.h>
+#include <tvm/ir/module.h>
+#include <tvm/relay/expr_functor.h>
+#include <tvm/runtime/device_api.h>
+#include <tvm/tir/builtin.h>
+#include <tvm/tir/expr.h>
+#include <tvm/tir/stmt.h>
+
+#include <algorithm>
+#include <list>
+#include <string>
+#include <vector>
+
+#include "../../runtime/meta_data.h"
+#include "compile_engine.h"
+#include "utils.h"
+
+namespace tvm {
+namespace relay {
+namespace backend {
+
+using IntegerArray = Array<Integer>;
+using ShapeVector = std::vector<std::vector<int64_t>>;
+using GraphAttrs = std::unordered_map<std::string, dmlc::any>;
+using TargetsMap = std::unordered_map<int, Target>;
+
+/*! \brief Lowered outputs */
+struct AOTLoweredOutput {
+  std::string graph_tir;
+  Map<String, IRModule> lowered_funcs;
+  Array<tvm::runtime::Module> external_mods;
+  std::unordered_map<std::string, std::pair<int, const tvm::runtime::NDArray>> params;
+  runtime::AOTMetadata aot_metadata;
+};
+
+class AotReturnSidVisitor : public ExprVisitor {
+ public:
+  explicit AotReturnSidVisitor(Map<Expr, Array<IntegerArray>> storage_device_map)
+      : storage_device_map_{storage_device_map}, return_sid_{-1} {}
+
+  IntegerArray FindReturnSid(Function func) {
+    VisitExpr(func->body);
+    return return_sid_;
+  }
+
+ protected:
+  void AssignReturnSid(Expr e) {
+    auto iter = storage_device_map_.find(e);
+    if (iter != storage_device_map_.end()) {
+      return_sid_ = (*iter).second[0];
+    }
+  }
+
+  void VisitExpr_(const ConstantNode* cn) override {
+    ExprVisitor::VisitExpr_(cn);
+    AssignReturnSid(GetRef<Expr>(cn));
+  }
+
+  void VisitExpr_(const VarNode* vn) override {
+    ExprVisitor::VisitExpr_(vn);
+    AssignReturnSid(GetRef<Expr>(vn));
+  }
+
+  void VisitExpr_(const CallNode* cn) override {
+    ExprVisitor::VisitExpr_(cn);
+    AssignReturnSid(GetRef<Expr>(cn));
+  }
+
+  void VisitExpr_(const LetNode* op) override { VisitExpr(op->body); }
+
+  void VisitExpr_(const TupleNode* tn) override {
+    ExprVisitor::VisitExpr_(tn);
+    AssignReturnSid(GetRef<Expr>(tn));
+  }
+
+ private:
+  Map<Expr, Array<IntegerArray>> storage_device_map_;
+  IntegerArray return_sid_;
+};
+
+using TIRNetwork = tvm::Array<tir::Stmt>;
+
+/*! \brief Code generator for graph runtime */
+class AOTCodegen : public ExprVisitor {
+ protected:
+  /*!
+   * \brief Utility function to allocate a DLTensor or TVMValue
+   * \param  type the type of allocation
+   * \param num the number of variable to allocate on the stack
+   * \return PrimExpr representing the allocated object
+   */
+  PrimExpr StackAlloca(std::string type, size_t num) {
+    Array<PrimExpr> args = {tir::StringImm(type), ConstInt32(num)};
+    return tir::Call(DataType::Handle(), tir::builtin::tvm_stack_alloca(), args);
+  }
+
+  /*!
+   * \brief Utility function to allocate memory for storage identifiers
+   * \param  memory_size_byte size in bytes of the allocation
+   * \return PrimExpr representing the allocated memory
+   */
+  PrimExpr AllocateBackendMemory(int memory_size_byte) {
+    // TODO(giuseros): use tir::Allocate instead of TVMBackendAllocWorkspace
+    // to enable unified memory planning
+    static const Op& op = Op::Get("tir.TVMBackendAllocWorkspace");
+    return tvm::tir::Call(DataType::Handle(), op, {1, 0, memory_size_byte, 2, 8});
+  }
+
+  /*!
+   * \brief Utility function to convert a concrete integer to a PrimExpr.
+   * \param num the number to convert
+   * \return PrimExpr representing num
+   */
+  inline PrimExpr ConstInt32(size_t num) {
+    ICHECK_LE(num, std::numeric_limits<int>::max());
+    return tir::make_const(DataType::Int(32), static_cast<int>(num));
+  }
+
+  /*!
+   * \brief Return a vector of variables that represents the sids for the given Relay Expr
+   */
+  std::vector<tir::Var> pack_sid(Expr expr) {
+    Array<IntegerArray> sids = storage_device_map_[expr];
+    std::vector<tir::Var> sid_vars;
+
+    // Note that an expression can have multiple sids associated with it
+    // e.g., returning multiple values from a function
+    for (const auto& sid : sids[0]) {
+      // Determine if an sid is an output buffer
+      int sid_int = static_cast<int>((sid.as<IntImmNode>())->value);
+      auto output_iter = std::find(return_sid_.begin(), return_sid_.end(), sid_int);
+      if (output_iter != return_sid_.end()) {
+        int output_index = std::distance(return_sid_.begin(), output_iter);
+        sid_vars.push_back(main_signature_[input_vars_.size() + output_index]);
+        continue;
+      }
+      // Pack the sid inside the TVMValue
+      auto sid_array = te::Var(make_string("sid_", sid, "_value"), DataType::Handle());
+      auto sid_value = sids_table_[sid];
+      tvm::PrimExpr set_tensor =
+          tvm::tir::Call(DataType::Handle(), tvm::tir::builtin::tvm_struct_set(),
+                         {sid_array, 0, tir::builtin::kArrData, sid_value});
+      stmts_.push_back(tir::LetStmt(sid_array, StackAlloca("array", 1), tir::Evaluate(set_tensor)));
+      sid_vars.push_back(sid_array);
+    }
+    return sid_vars;
+  }
+
+  /*!
+   * \brief Utility function to return a parameter associated with an expression
+   * \param expr Relay Expression assicated with the parameter
+   * \return Variable that represents the DLTensor associated with the parameters
+   */
+  tir::Var pack_param(Expr expr) {
+    // TODO(giuseros): Using call_extern to call into lookup_linked_param. This is because the

Review comment:
       I am not sure I am following here: as of now, I am using `lookup_linked_param` , only I am using `extern` because of the return limitation in C. In future the idea would be to get rid of this function call by adding a builtin like `builtin::lookup_linked_param` that can directly call the static parameters defined in the compilation unit.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] areusch commented on a change in pull request #7785: [AOT] Introducing AOT in TVM

Posted by GitBox <gi...@apache.org>.

areusch commented on a change in pull request #7785:
URL: https://github.com/apache/tvm/pull/7785#discussion_r608905836



##########
File path: src/relay/backend/aot_codegen.cc
##########
@@ -0,0 +1,675 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+/*!
+ * \file relay/backend/graph_codegen.cc
+ * \brief Graph runtime codegen
+ */
+
+#include <dmlc/any.h>
+#include <tvm/ir/module.h>
+#include <tvm/relay/expr_functor.h>
+#include <tvm/runtime/device_api.h>
+#include <tvm/tir/builtin.h>
+#include <tvm/tir/expr.h>
+#include <tvm/tir/stmt.h>
+
+#include <algorithm>
+#include <list>
+#include <string>
+#include <vector>
+
+#include "../../runtime/meta_data.h"
+#include "compile_engine.h"
+#include "utils.h"
+
+namespace tvm {
+namespace relay {
+namespace backend {
+
+using IntegerArray = Array<Integer>;
+using ShapeVector = std::vector<std::vector<int64_t>>;
+using GraphAttrs = std::unordered_map<std::string, dmlc::any>;
+using TargetsMap = std::unordered_map<int, Target>;
+
+/*! \brief Lowered outputs */
+struct AOTLoweredOutput {
+  std::string graph_tir;
+  Map<String, IRModule> lowered_funcs;
+  Array<tvm::runtime::Module> external_mods;
+  std::unordered_map<std::string, std::pair<int, const tvm::runtime::NDArray>> params;
+  runtime::AOTMetadata aot_metadata;
+};
+
+class AotReturnSidVisitor : public ExprVisitor {
+ public:
+  explicit AotReturnSidVisitor(Map<Expr, Array<IntegerArray>> storage_device_map)
+      : storage_device_map_{storage_device_map}, return_sid_{-1} {}
+
+  IntegerArray FindReturnSid(Function func) {
+    VisitExpr(func->body);
+    return return_sid_;
+  }
+
+ protected:
+  void AssignReturnSid(Expr e) {
+    auto iter = storage_device_map_.find(e);
+    if (iter != storage_device_map_.end()) {
+      return_sid_ = (*iter).second[0];
+    }
+  }
+
+  void VisitExpr_(const ConstantNode* cn) override {
+    ExprVisitor::VisitExpr_(cn);
+    AssignReturnSid(GetRef<Expr>(cn));
+  }
+
+  void VisitExpr_(const VarNode* vn) override {
+    ExprVisitor::VisitExpr_(vn);
+    AssignReturnSid(GetRef<Expr>(vn));
+  }
+
+  void VisitExpr_(const CallNode* cn) override {
+    ExprVisitor::VisitExpr_(cn);
+    AssignReturnSid(GetRef<Expr>(cn));
+  }
+
+  void VisitExpr_(const LetNode* op) override { VisitExpr(op->body); }
+
+  void VisitExpr_(const TupleNode* tn) override {
+    ExprVisitor::VisitExpr_(tn);
+    AssignReturnSid(GetRef<Expr>(tn));
+  }
+
+ private:
+  Map<Expr, Array<IntegerArray>> storage_device_map_;
+  IntegerArray return_sid_;
+};
+
+using TIRNetwork = tvm::Array<tir::Stmt>;
+
+/*! \brief Code generator for graph runtime */
+class AOTCodegen : public ExprVisitor {
+ protected:
+  /*!
+   * \brief Utility function to allocate a DLTensor or TVMValue
+   * \param  type the type of allocation
+   * \param num the number of variable to allocate on the stack
+   * \return PrimExpr representing the allocated object
+   */
+  PrimExpr StackAlloca(std::string type, size_t num) {
+    Array<PrimExpr> args = {tir::StringImm(type), ConstInt32(num)};
+    return tir::Call(DataType::Handle(), tir::builtin::tvm_stack_alloca(), args);
+  }
+
+  /*!
+   * \brief Utility function to allocate memory for storage identifiers
+   * \param  memory_size_byte size in bytes of the allocation
+   * \return PrimExpr representing the allocated memory
+   */
+  PrimExpr AllocateBackendMemory(int memory_size_byte) {
+    // TODO(giuseros): use tir::Allocate instead of TVMBackendAllocWorkspace
+    // to enable unified memory planning
+    static const Op& op = Op::Get("tir.TVMBackendAllocWorkspace");
+    return tvm::tir::Call(DataType::Handle(), op, {1, 0, memory_size_byte, 2, 8});
+  }
+
+  /*!
+   * \brief Utility function to convert a concrete integer to a PrimExpr.
+   * \param num the number to convert
+   * \return PrimExpr representing num
+   */
+  inline PrimExpr ConstInt32(size_t num) {
+    ICHECK_LE(num, std::numeric_limits<int>::max());
+    return tir::make_const(DataType::Int(32), static_cast<int>(num));
+  }
+
+  /*!
+   * \brief Return a vector of variables that represents the sids for the given Relay Expr
+   */
+  std::vector<tir::Var> pack_sid(Expr expr) {
+    Array<IntegerArray> sids = storage_device_map_[expr];
+    std::vector<tir::Var> sid_vars;
+
+    // Note that an expression can have multiple sids associated with it
+    // e.g., returning multiple values from a function
+    for (const auto& sid : sids[0]) {
+      // Determine if an sid is an output buffer
+      int sid_int = static_cast<int>((sid.as<IntImmNode>())->value);
+      auto output_iter = std::find(return_sid_.begin(), return_sid_.end(), sid_int);
+      if (output_iter != return_sid_.end()) {
+        int output_index = std::distance(return_sid_.begin(), output_iter);
+        sid_vars.push_back(main_signature_[input_vars_.size() + output_index]);
+        continue;
+      }
+      // Pack the sid inside the TVMValue
+      auto sid_array = te::Var(make_string("sid_", sid, "_value"), DataType::Handle());
+      auto sid_value = sids_table_[sid];
+      tvm::PrimExpr set_tensor =
+          tvm::tir::Call(DataType::Handle(), tvm::tir::builtin::tvm_struct_set(),
+                         {sid_array, 0, tir::builtin::kArrData, sid_value});
+      stmts_.push_back(tir::LetStmt(sid_array, StackAlloca("array", 1), tir::Evaluate(set_tensor)));
+      sid_vars.push_back(sid_array);
+    }
+    return sid_vars;
+  }
+
+  /*!
+   * \brief Utility function to return a parameter associated with an expression
+   * \param expr Relay Expression assicated with the parameter
+   * \return Variable that represents the DLTensor associated with the parameters
+   */
+  tir::Var pack_param(Expr expr) {
+    // TODO(giuseros): Using call_extern to call into lookup_linked_param. This is because the
+    // builtin::ret is not supported yet in the c target. Once return is supported we can use
+    // tvm_call_packed_lowered().
+    int param_sid = param_storage_ids_[reverse_params_lookup_[expr]];
+    auto lookup_linked_param_fn = tir::StringImm(::tvm::runtime::symbol::tvm_lookup_linked_param);
+    auto param_array = te::Var(make_string("param_", param_sid, "_array"), DataType::Handle());
+
+    // Compose the lookup_call using a local stack
+    Array<tir::Stmt> lookup_call;
+    auto param_var = te::Var(make_string("param_", param_sid, "_value"), DataType::Handle());
+    auto ret_var = te::Var("ret_value", DataType::Handle());
+    auto ret_code = te::Var("ret_value", DataType::Handle());
+
+    lookup_call.push_back(tir::Evaluate(
+        tvm::tir::Call(DataType::Handle(), tvm::tir::builtin::tvm_struct_set(),
+                       {param_var, 0, tir::builtin::kTVMValueContent, ConstInt32(param_sid)})));
+    lookup_call.push_back(tir::Evaluate(
+        tvm::tir::Call(DataType::Handle(), tir::builtin::call_extern(),
+                       {lookup_linked_param_fn, param_var, 0, 0, ret_var, ret_code, 0})));
+    auto ret_var_handle = tvm::tir::Call(DataType::Handle(), tvm::tir::builtin::tvm_struct_get(),
+                                         {ret_var, 0, tir::builtin::kTVMValueContent});
+
+    // Set the param to the value returned by lookup_call
+    tvm::PrimExpr set_param_array =
+        tvm::tir::Call(DataType::Handle(), tvm::tir::builtin::tvm_struct_set(),
+                       {param_array, 0, tir::builtin::kArrData, ret_var_handle});
+    lookup_call.push_back(tir::Evaluate(set_param_array));
+
+    tir::Stmt lookup_body = tir::SeqStmt(lookup_call);
+
+    // Allocate the DLTensors on the stack
+    lookup_body = tir::LetStmt(param_var, StackAlloca("arg_value", 1), lookup_body);
+    lookup_body = tir::LetStmt(ret_var, StackAlloca("arg_value", 1), lookup_body);
+    lookup_body = tir::LetStmt(ret_code, StackAlloca("arg_value", 1), lookup_body);
+    lookup_body = tir::LetStmt(param_array, StackAlloca("arg_value", 1), lookup_body);
+    stmts_.push_back(lookup_body);
+    return param_array;
+  }
+
+  /*!
+   * brief Given an expression return the variable(s) associated with that expression
+   */
+  std::vector<te::Var> find_expr(Expr arg) {
+    auto input_iter = std::find(input_vars_.begin(), input_vars_.end(), arg);
+    if (input_iter != input_vars_.end()) {
+      // Input variable
+      int main_index = std::distance(input_vars_.begin(), input_iter);
+      return {main_signature_[main_index]};
+    } else if (reverse_params_lookup_.find(arg) != reverse_params_lookup_.end()) {
+      // Parameter of the network
+      return {pack_param(arg)};
+    } else {
+      // Storage identifier (i.e., intermediate memory)
+      return pack_sid(arg);
+    }
+  }
+
+  /*!
+   * brief Call a function with a given name
+   */
+  void func_call(Call call, std::string func_name) {
+    tvm::Array<PrimExpr> args{tvm::tir::StringImm(func_name)};
+    std::vector<tir::Stmt> func_call_stmts;
+
+    // Pack the inputs
+    for (Expr arg : call->args) {
+      auto var_arg = find_expr(arg);
+      args.push_back(var_arg[0]);
+    }
+
+    auto ret_expr = Downcast<Expr>(call);
+
+    // Pack the return(s) value. A call node can produce multiple outputs
+    for (const auto& var : pack_sid(ret_expr)) {
+      args.push_back(var);
+    }
+
+    // Use tvm_call_packed to execute the function
+    func_call_stmts.push_back(tir::Evaluate(
+        tvm::tir::Call(DataType::Int(32), tvm::tir::builtin::tvm_call_packed(), args)));
+    tir::Stmt body = tir::SeqStmt(func_call_stmts);
+    stmts_.push_back(body);
+  }
+
+  /*!
+   * brief Copy a variable to the output. This function is mainly used in edge cases

Review comment:
       effectively, yeah. I don't know that it's a mainline use case, but what's your use case in returning a param from the top-level AOT func? isn't that the same use case?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] areusch commented on a change in pull request #7785: [AOT] Introducing AOT in TVM

Posted by GitBox <gi...@apache.org>.

areusch commented on a change in pull request #7785:
URL: https://github.com/apache/tvm/pull/7785#discussion_r608905836



##########
File path: src/relay/backend/aot_codegen.cc
##########
@@ -0,0 +1,675 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+/*!
+ * \file relay/backend/graph_codegen.cc
+ * \brief Graph runtime codegen
+ */
+
+#include <dmlc/any.h>
+#include <tvm/ir/module.h>
+#include <tvm/relay/expr_functor.h>
+#include <tvm/runtime/device_api.h>
+#include <tvm/tir/builtin.h>
+#include <tvm/tir/expr.h>
+#include <tvm/tir/stmt.h>
+
+#include <algorithm>
+#include <list>
+#include <string>
+#include <vector>
+
+#include "../../runtime/meta_data.h"
+#include "compile_engine.h"
+#include "utils.h"
+
+namespace tvm {
+namespace relay {
+namespace backend {
+
+using IntegerArray = Array<Integer>;
+using ShapeVector = std::vector<std::vector<int64_t>>;
+using GraphAttrs = std::unordered_map<std::string, dmlc::any>;
+using TargetsMap = std::unordered_map<int, Target>;
+
+/*! \brief Lowered outputs */
+struct AOTLoweredOutput {
+  std::string graph_tir;
+  Map<String, IRModule> lowered_funcs;
+  Array<tvm::runtime::Module> external_mods;
+  std::unordered_map<std::string, std::pair<int, const tvm::runtime::NDArray>> params;
+  runtime::AOTMetadata aot_metadata;
+};
+
+class AotReturnSidVisitor : public ExprVisitor {
+ public:
+  explicit AotReturnSidVisitor(Map<Expr, Array<IntegerArray>> storage_device_map)
+      : storage_device_map_{storage_device_map}, return_sid_{-1} {}
+
+  IntegerArray FindReturnSid(Function func) {
+    VisitExpr(func->body);
+    return return_sid_;
+  }
+
+ protected:
+  void AssignReturnSid(Expr e) {
+    auto iter = storage_device_map_.find(e);
+    if (iter != storage_device_map_.end()) {
+      return_sid_ = (*iter).second[0];
+    }
+  }
+
+  void VisitExpr_(const ConstantNode* cn) override {
+    ExprVisitor::VisitExpr_(cn);
+    AssignReturnSid(GetRef<Expr>(cn));
+  }
+
+  void VisitExpr_(const VarNode* vn) override {
+    ExprVisitor::VisitExpr_(vn);
+    AssignReturnSid(GetRef<Expr>(vn));
+  }
+
+  void VisitExpr_(const CallNode* cn) override {
+    ExprVisitor::VisitExpr_(cn);
+    AssignReturnSid(GetRef<Expr>(cn));
+  }
+
+  void VisitExpr_(const LetNode* op) override { VisitExpr(op->body); }
+
+  void VisitExpr_(const TupleNode* tn) override {
+    ExprVisitor::VisitExpr_(tn);
+    AssignReturnSid(GetRef<Expr>(tn));
+  }
+
+ private:
+  Map<Expr, Array<IntegerArray>> storage_device_map_;
+  IntegerArray return_sid_;
+};
+
+using TIRNetwork = tvm::Array<tir::Stmt>;
+
+/*! \brief Code generator for graph runtime */
+class AOTCodegen : public ExprVisitor {
+ protected:
+  /*!
+   * \brief Utility function to allocate a DLTensor or TVMValue
+   * \param  type the type of allocation
+   * \param num the number of variable to allocate on the stack
+   * \return PrimExpr representing the allocated object
+   */
+  PrimExpr StackAlloca(std::string type, size_t num) {
+    Array<PrimExpr> args = {tir::StringImm(type), ConstInt32(num)};
+    return tir::Call(DataType::Handle(), tir::builtin::tvm_stack_alloca(), args);
+  }
+
+  /*!
+   * \brief Utility function to allocate memory for storage identifiers
+   * \param  memory_size_byte size in bytes of the allocation
+   * \return PrimExpr representing the allocated memory
+   */
+  PrimExpr AllocateBackendMemory(int memory_size_byte) {
+    // TODO(giuseros): use tir::Allocate instead of TVMBackendAllocWorkspace
+    // to enable unified memory planning
+    static const Op& op = Op::Get("tir.TVMBackendAllocWorkspace");
+    return tvm::tir::Call(DataType::Handle(), op, {1, 0, memory_size_byte, 2, 8});
+  }
+
+  /*!
+   * \brief Utility function to convert a concrete integer to a PrimExpr.
+   * \param num the number to convert
+   * \return PrimExpr representing num
+   */
+  inline PrimExpr ConstInt32(size_t num) {
+    ICHECK_LE(num, std::numeric_limits<int>::max());
+    return tir::make_const(DataType::Int(32), static_cast<int>(num));
+  }
+
+  /*!
+   * \brief Return a vector of variables that represents the sids for the given Relay Expr
+   */
+  std::vector<tir::Var> pack_sid(Expr expr) {
+    Array<IntegerArray> sids = storage_device_map_[expr];
+    std::vector<tir::Var> sid_vars;
+
+    // Note that an expression can have multiple sids associated with it
+    // e.g., returning multiple values from a function
+    for (const auto& sid : sids[0]) {
+      // Determine if an sid is an output buffer
+      int sid_int = static_cast<int>((sid.as<IntImmNode>())->value);
+      auto output_iter = std::find(return_sid_.begin(), return_sid_.end(), sid_int);
+      if (output_iter != return_sid_.end()) {
+        int output_index = std::distance(return_sid_.begin(), output_iter);
+        sid_vars.push_back(main_signature_[input_vars_.size() + output_index]);
+        continue;
+      }
+      // Pack the sid inside the TVMValue
+      auto sid_array = te::Var(make_string("sid_", sid, "_value"), DataType::Handle());
+      auto sid_value = sids_table_[sid];
+      tvm::PrimExpr set_tensor =
+          tvm::tir::Call(DataType::Handle(), tvm::tir::builtin::tvm_struct_set(),
+                         {sid_array, 0, tir::builtin::kArrData, sid_value});
+      stmts_.push_back(tir::LetStmt(sid_array, StackAlloca("array", 1), tir::Evaluate(set_tensor)));
+      sid_vars.push_back(sid_array);
+    }
+    return sid_vars;
+  }
+
+  /*!
+   * \brief Utility function to return a parameter associated with an expression
+   * \param expr Relay Expression assicated with the parameter
+   * \return Variable that represents the DLTensor associated with the parameters
+   */
+  tir::Var pack_param(Expr expr) {
+    // TODO(giuseros): Using call_extern to call into lookup_linked_param. This is because the
+    // builtin::ret is not supported yet in the c target. Once return is supported we can use
+    // tvm_call_packed_lowered().
+    int param_sid = param_storage_ids_[reverse_params_lookup_[expr]];
+    auto lookup_linked_param_fn = tir::StringImm(::tvm::runtime::symbol::tvm_lookup_linked_param);
+    auto param_array = te::Var(make_string("param_", param_sid, "_array"), DataType::Handle());
+
+    // Compose the lookup_call using a local stack
+    Array<tir::Stmt> lookup_call;
+    auto param_var = te::Var(make_string("param_", param_sid, "_value"), DataType::Handle());
+    auto ret_var = te::Var("ret_value", DataType::Handle());
+    auto ret_code = te::Var("ret_value", DataType::Handle());
+
+    lookup_call.push_back(tir::Evaluate(
+        tvm::tir::Call(DataType::Handle(), tvm::tir::builtin::tvm_struct_set(),
+                       {param_var, 0, tir::builtin::kTVMValueContent, ConstInt32(param_sid)})));
+    lookup_call.push_back(tir::Evaluate(
+        tvm::tir::Call(DataType::Handle(), tir::builtin::call_extern(),
+                       {lookup_linked_param_fn, param_var, 0, 0, ret_var, ret_code, 0})));
+    auto ret_var_handle = tvm::tir::Call(DataType::Handle(), tvm::tir::builtin::tvm_struct_get(),
+                                         {ret_var, 0, tir::builtin::kTVMValueContent});
+
+    // Set the param to the value returned by lookup_call
+    tvm::PrimExpr set_param_array =
+        tvm::tir::Call(DataType::Handle(), tvm::tir::builtin::tvm_struct_set(),
+                       {param_array, 0, tir::builtin::kArrData, ret_var_handle});
+    lookup_call.push_back(tir::Evaluate(set_param_array));
+
+    tir::Stmt lookup_body = tir::SeqStmt(lookup_call);
+
+    // Allocate the DLTensors on the stack
+    lookup_body = tir::LetStmt(param_var, StackAlloca("arg_value", 1), lookup_body);
+    lookup_body = tir::LetStmt(ret_var, StackAlloca("arg_value", 1), lookup_body);
+    lookup_body = tir::LetStmt(ret_code, StackAlloca("arg_value", 1), lookup_body);
+    lookup_body = tir::LetStmt(param_array, StackAlloca("arg_value", 1), lookup_body);
+    stmts_.push_back(lookup_body);
+    return param_array;
+  }
+
+  /*!
+   * brief Given an expression return the variable(s) associated with that expression
+   */
+  std::vector<te::Var> find_expr(Expr arg) {
+    auto input_iter = std::find(input_vars_.begin(), input_vars_.end(), arg);
+    if (input_iter != input_vars_.end()) {
+      // Input variable
+      int main_index = std::distance(input_vars_.begin(), input_iter);
+      return {main_signature_[main_index]};
+    } else if (reverse_params_lookup_.find(arg) != reverse_params_lookup_.end()) {
+      // Parameter of the network
+      return {pack_param(arg)};
+    } else {
+      // Storage identifier (i.e., intermediate memory)
+      return pack_sid(arg);
+    }
+  }
+
+  /*!
+   * brief Call a function with a given name
+   */
+  void func_call(Call call, std::string func_name) {
+    tvm::Array<PrimExpr> args{tvm::tir::StringImm(func_name)};
+    std::vector<tir::Stmt> func_call_stmts;
+
+    // Pack the inputs
+    for (Expr arg : call->args) {
+      auto var_arg = find_expr(arg);
+      args.push_back(var_arg[0]);
+    }
+
+    auto ret_expr = Downcast<Expr>(call);
+
+    // Pack the return(s) value. A call node can produce multiple outputs
+    for (const auto& var : pack_sid(ret_expr)) {
+      args.push_back(var);
+    }
+
+    // Use tvm_call_packed to execute the function
+    func_call_stmts.push_back(tir::Evaluate(
+        tvm::tir::Call(DataType::Int(32), tvm::tir::builtin::tvm_call_packed(), args)));
+    tir::Stmt body = tir::SeqStmt(func_call_stmts);
+    stmts_.push_back(body);
+  }
+
+  /*!
+   * brief Copy a variable to the output. This function is mainly used in edge cases

Review comment:
       effectively, yeah. I don't know that it's a mainline use case, but what's your use case in returning a var from the top-level AOT func? isn't that the same use case?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] giuseros commented on a change in pull request #7785: [AOT] Introducing AOT in TVM

Posted by GitBox <gi...@apache.org>.

giuseros commented on a change in pull request #7785:
URL: https://github.com/apache/tvm/pull/7785#discussion_r611868584



##########
File path: src/target/source/codegen_c_host.cc
##########
@@ -40,13 +40,16 @@ namespace codegen {
 
 CodeGenCHost::CodeGenCHost() { module_name_ = GetUniqueName("__tvm_module_ctx"); }
 
-void CodeGenCHost::Init(bool output_ssa, bool emit_asserts, std::string target_str) {
+void CodeGenCHost::Init(bool output_ssa, bool emit_asserts, bool is_aot_executor,
+                        std::string target_str) {

Review comment:
       So, the main change in the code generator is for the packed call to not use the function registry (but to call directly into the library). If I understand your comment you are suggesting to have a `use_function_registry` and set that to `false` when we use AOT. I am happy with that, although in the LLVM codegen path we explicitly pass things like `is_crt` etc...




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] giuseros commented on a change in pull request #7785: [AOT] Introducing AOT in TVM

Posted by GitBox <gi...@apache.org>.

giuseros commented on a change in pull request #7785:
URL: https://github.com/apache/tvm/pull/7785#discussion_r608194704



##########
File path: src/target/source/codegen_c_host.cc
##########
@@ -211,21 +219,34 @@ void CodeGenCHost::PrintGetFuncFromBackend(const std::string& func_name,
   this->stream << "}\n";
 }
 
-void CodeGenCHost::PrintFuncCall(const std::string& packed_func_name, int num_args) {
+void CodeGenCHost::PrintFuncCall(const std::string& packed_func_name, PrimExpr values,
+                                 int num_args) {
   this->PrintIndent();
+  std::string stack_value = "stack_value";
+  if (const VarNode* stack_value_var = values.as<VarNode>()) {
+    stack_value = stack_value_var->name_hint;
+  }
   std::string ret_val = GetUniqueName("ret_val");
   std::string ret_type_code = GetUniqueName("ret_type_code");
   this->stream << "TVMValue " << ret_val << ";\n";
   this->PrintIndent();
   this->stream << "int " << ret_type_code << ";\n";
   this->PrintIndent();
-  this->stream << "if (TVMFuncCall(" << packed_func_name << ", "
-               << "(TVMValue*) stack_value"
-               << ", "
+
+  if (is_aot_executor_) {
+    this->stream << "if (" << packed_func_name << "( "
+                 << "(TVMValue*) " << stack_value;
+  } else {
+    this->stream << "if (TVMFuncCall(" << packed_func_name << ", "
+                 << "(TVMValue*) stack_value";
+  }
+  this->stream << ", "
                << "(int*) stack_tcode"
                << ", " << num_args << ", "
-               << "&" << ret_val << ", "
-               << "&" << ret_type_code << ") != 0) {\n";
+               << "&" << ret_val << ", ";
+  this->stream << "&" << ret_type_code;
+  this->stream << (is_aot_executor_ ? ", NULL" : "") << ") != 0) {\n";

Review comment:
       So this is because I am calling the operators directly and the last argument of a packed function is  a `context_handle`, which I am setting to NULL (for now)




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] areusch commented on pull request #7785: [AOT] Introducing AOT in TVM

Posted by GitBox <gi...@apache.org>.

areusch commented on pull request #7785:
URL: https://github.com/apache/tvm/pull/7785#issuecomment-832876837


   @manupa-arm please take another look and explicitly approve if you're okay with this!
   https://tvm.apache.org/docs/contribute/code_review.html#approve-and-request-changes-explicitly


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] giuseros commented on pull request #7785: [AOT] Introducing AOT in TVM

Posted by GitBox <gi...@apache.org>.

giuseros commented on pull request #7785:
URL: https://github.com/apache/tvm/pull/7785#issuecomment-821404672


   Hi @areusch , 
   Thanks for you comments. I left some questions where I didn't get exactly what you meant (sorry :) ). The major thing I will do next week is to introduce a new intrinsic to support the cpacked_func and re-upload the PR.  


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] areusch commented on pull request #7785: [AOT] Introducing AOT in TVM

Posted by GitBox <gi...@apache.org>.

areusch commented on pull request #7785:
URL: https://github.com/apache/tvm/pull/7785#issuecomment-828602244

@manupa-arm @giuseros I discussed this with @tqchen a bit. The overarching problem is the Target string is used to key autotvm logs, so adding non-autotuning attrs to Target means that tuning logs can't be directly used when those attrs change. To address this, we will either need: a) a new e.g. runtime-options object or b) a way to filter the non-autotvm related attrs from Target when building the tuning log key. A related question is whether or not we have a way to raise an exception when a user specifies an unregistered Target key--I can't remember offhand but suspect @zxybazh knows. This would become more important when we split config into two objects and need to provide feedback when the user accidentally specifies e.g. executor as part of Target.

We also discussed that it's possible executor could have a place in Target in the future, should graph-level tuning further evolve. However, here we need to implement something that reflects the present-day state of the repo. We have two options proposed:
1. add `--executor=` with no default to Target, which will not affect tune logs except in cases where aot executor is used.
2. add `executor=` as a `relay.build` kwarg.

Both of these are hacky--the proper thing is something like the a) or b) solutions proposed above. Between hacks 1 and 2, the downside of 1 has been spelled out earlier here, and the downside of 2 is that it creates a fairly central kwarg we'll have to deprecate in the near future. I guess I have a preference for solution 1), because we already include `--runtime` in Target, so in that we'll need to "deprecate" the `--runtime` Target attr as well, we are deprecating things in a single place rather than in two.

--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] areusch commented on a change in pull request #7785: [AOT] Introducing AOT in TVM

Posted by GitBox <gi...@apache.org>.

areusch commented on a change in pull request #7785:
URL: https://github.com/apache/tvm/pull/7785#discussion_r620537842



##########
File path: src/relay/backend/build_module.cc
##########
@@ -131,14 +164,14 @@ class RelayBuildModule : public runtime::ModuleNode {
   PackedFunc GetFunction(const std::string& name, const ObjectPtr<Object>& sptr_to_self) final {
     if (name == "get_graph_json") {
       return PackedFunc(
-          [sptr_to_self, this](TVMArgs args, TVMRetValue* rv) { *rv = this->GetGraphJSON(); });
+          [sptr_to_self, this](TVMArgs args, TVMRetValue* rv) { *rv = this->GetJSON(); });

Review comment:
       should we keep this as GetGraphJSON?

##########
File path: src/target/source/source_module.cc
##########
@@ -191,17 +192,36 @@ class CSourceCrtMetadataModuleNode : public runtime::ModuleNode {
           << "}\n";
   }
 
+  void GenerateAOTDescriptor() {
+    code_ << "#include \"tvm/runtime/crt/internal/aot_executor/aot_executor.h\"\n";

Review comment:
       maybe should promote the tvm_model_t to a non-internal header though? I think we consider the operator code as external to CRT

##########
File path: src/target/source/codegen_source_base.h
##########
@@ -155,7 +156,8 @@ runtime::Module CSourceModuleCreate(const String& code, const String& fmt,
  */
 runtime::Module CreateMetadataModule(
     const std::unordered_map<std::string, runtime::NDArray>& params, runtime::Module target_module,
-    const Array<runtime::Module>& ext_modules, Target target);
+    const Array<runtime::Module>& ext_modules, Target target,

Review comment:
       given this is just invoked from build_module.cc, maybe we should make metadata have no default?

##########
File path: python/tvm/relay/build_module.py
##########
@@ -213,7 +218,7 @@ def _build_module_no_factory(mod, target=None, target_host=None, params=None, mo
     return build(mod, target, params=params, mod_name=mod_name).module
 
 
-def build(ir_mod, target=None, target_host=None, params=None, mod_name="default"):
+def build(ir_mod, target=None, target_host=None, params=None, mod_name="default", executor="graph"):

Review comment:
       cc @tqchen @joresch would be great to get feedback on API changes here.

##########
File path: python/tvm/relay/backend/executor_factory.py
##########
@@ -14,21 +14,125 @@
 # KIND, either express or implied.  See the License for the
 # specific language governing permissions and limitations
 # under the License.
-"""Graph executor factory."""
+"""Executor factory modules."""
+from abc import abstractmethod
 import warnings
+
+from tvm import tir
+
 from ..._ffi.base import string_types
 from ..._ffi.registry import get_global_func
 from ...runtime import ndarray
 
 
-class GraphExecutorFactoryModule:
+class ExecutorFactoryModule:
+    """Common interface for executor factory modules
+    This class describes the common API of different
+    factory modules
+    """
+
+    @abstractmethod
+    def get_internal_repr(self):
+        """Common function to return the internal representation

Review comment:
       can you address this one?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] giuseros commented on a change in pull request #7785: [AOT] Introducing AOT in TVM

Posted by GitBox <gi...@apache.org>.

giuseros commented on a change in pull request #7785:
URL: https://github.com/apache/tvm/pull/7785#discussion_r608168664



##########
File path: python/tvm/micro/model_library_format.py
##########
@@ -86,10 +86,13 @@ def _build_memory_map(graph_json):
     list :
         A list with one entry per storage id describing that memory.
     """
-    graph = json.loads(graph_json)
+    memory_map = []
+    if graph_str.startswith("primfn"):

Review comment:
       So, ok.The main point here is that I was trying to unify a "graph" representation. When we do: `graph, runtime_mod, params = bld_mod.build(mod=ir_mod, target=target, params=params)`:
   * If we are using graph executor `graph` is a string containing the json
   * If we are using aot executor `graph` is a string containing the string representation of the IRModule (containing the the `tvm__run_func` PrimFunc)
   That's why there is that check. The graph string will represent two different things. In the case of the JSON I am extracting the memory map, while in case of AOT I am simply returning an empty list. Two questions:
   * Should we return a memory map for AOT as well? I thought that was used mostly by the graph executor
   * Should I move the logic to produce the memory map or not  into `export_model_library_format`? I.e., something like: `if is_aot memory_map=[] else memory_map=_build_memoryMap(mod.graph)`
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] giuseros commented on a change in pull request #7785: [AOT] Introducing AOT in TVM

Posted by GitBox <gi...@apache.org>.

giuseros commented on a change in pull request #7785:
URL: https://github.com/apache/tvm/pull/7785#discussion_r614807239



##########
File path: src/target/source/codegen_c_host.cc
##########
@@ -40,13 +40,16 @@ namespace codegen {
 
 CodeGenCHost::CodeGenCHost() { module_name_ = GetUniqueName("__tvm_module_ctx"); }
 
-void CodeGenCHost::Init(bool output_ssa, bool emit_asserts, std::string target_str) {
+void CodeGenCHost::Init(bool output_ssa, bool emit_asserts, bool is_aot_executor,
+                        std::string target_str) {

Review comment:
       Hi @areusch ,
   As I said on the discuss post, I agree about having a separate discussion on Memory management, PackedFunc and firmware-facing API. I would say that none of this is covered in this PR. In this PR, we have:
   * Relay->TIR codegen (this is the main part)
   * Stack allocator (this is only an add-on, which can be exposed through the usual PlatformAllocator, as for the normal paged allocator)
   * Got rid of function registry when calling the functions in AOT
   * added a minimalistic tvm_runtime_run function in aot_executor.h that calls the PackedFunc contained in the tvm_model_t struct. 
   The only way I affect the CRT "interface" in this PR is by adding a new implementation of `c_backend_api.h` in `aot_backend_api.c` to get rid of `TVMFuncRegisterGlobal` which we don't use anymore. If you wish, I can use `crt_backend_api.c` and add a fake implementation of `TVMFuncRegisterGlobal` to work around the undefined symbol
   So I would propose to finish this review, and start tackling the others problems in different RFCs




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] giuseros commented on pull request #7785: [AOT] Introducing AOT in TVM

Posted by GitBox <gi...@apache.org>.

giuseros commented on pull request #7785:
URL: https://github.com/apache/tvm/pull/7785#issuecomment-828309661


   Hi @areusch , @tqchen , @manupa-arm , 
   About the target vs build discussion. I think we all agree that the executor should not be part of the target, but it should be a build parameter. 
   
   The question is if adding the build parameter now, or if doing it in another PR. The point is that the executor is really only used in `build_module.cc` , so moving it as a build parameter seems the best choice. This also avoids hacky workarounds in situations where the `target_host` is not defined (e.g., `cuda`). 
   
   I understand the argument to leave it as a target option now and then move all the target options in an option object in a later PR. But I would prefer to reduce the number of hacks for AOT from day 1, and in a later PR try to uniform crt and link-params to AOT. In other words, if there are no drawbacks, let's try to make AOT the "proper" way and let's then address the other target options to uniform to AOT. Thoughts?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] giuseros commented on pull request #7785: [AOT] Introducing AOT in TVM

Posted by GitBox <gi...@apache.org>.

giuseros commented on pull request #7785:
URL: https://github.com/apache/tvm/pull/7785#issuecomment-829117394


   Hi @areusch ,
   Should we set for a middle ground solution? I moved back the python interface of `tvm.relay.build`, which now retrieves the executor from the target. The inner functions, though, still accept an `executor` parameter to be passed. In this way we maintain the code cleanness (no need to query the target from multiple places) but we preserve the external interface. What do you think?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] giuseros commented on a change in pull request #7785: [AOT] Introducing AOT in TVM

Posted by GitBox <gi...@apache.org>.

giuseros commented on a change in pull request #7785:
URL: https://github.com/apache/tvm/pull/7785#discussion_r614875602



##########
File path: src/target/source/codegen_c_host.cc
##########
@@ -40,13 +40,16 @@ namespace codegen {
 
 CodeGenCHost::CodeGenCHost() { module_name_ = GetUniqueName("__tvm_module_ctx"); }
 
-void CodeGenCHost::Init(bool output_ssa, bool emit_asserts, std::string target_str) {
+void CodeGenCHost::Init(bool output_ssa, bool emit_asserts, bool is_aot_executor,
+                        std::string target_str) {

Review comment:
       Hi @tqchen. I see what you are proposing and I think it is a very good idea. I will try to give it a go!




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] tqchen commented on a change in pull request #7785: [AOT] Introducing AOT in TVM

Posted by GitBox <gi...@apache.org>.

tqchen commented on a change in pull request #7785:
URL: https://github.com/apache/tvm/pull/7785#discussion_r621396937



##########
File path: include/tvm/tir/builtin.h
##########
@@ -386,6 +396,21 @@ TVM_DLL const Op& tvm_thread_context();
  */
 TVM_DLL const Op& tvm_call_packed_lowered();
 
+/*!
+ * \brief Lowered version of call c-packed, the space of value and
+ *  type codes are explicitly allocated.
+ *
+ *  int tvm_call_packed_lowered(fname,

Review comment:
       call cpacked

##########
File path: include/tvm/tir/builtin.h
##########
@@ -343,6 +343,16 @@ TVM_DLL const Op& tvm_stack_make_array();
  */
 TVM_DLL const Op& tvm_call_packed();
 
+/*!
+ * \brief See pesudo code
+ *
+ *  int tvm_call_packed(fname, TVMValue* args) {
+ *     (*fname)(args, type_code_of(args), len(args));
+ *     return 0;
+ *  }
+ */
+TVM_DLL const Op& tvm_call_cpacked();

Review comment:
       in particular, we might also want to support POD return types

##########
File path: src/relay/backend/aot_executor_codegen.cc
##########
@@ -0,0 +1,674 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+/*!
+ * \file relay/backend/graph_codegen.cc
+ * \brief Graph runtime codegen
+ */
+
+#include <dmlc/any.h>
+#include <tvm/ir/module.h>
+#include <tvm/relay/expr_functor.h>
+#include <tvm/runtime/device_api.h>
+#include <tvm/tir/builtin.h>
+#include <tvm/tir/expr.h>
+#include <tvm/tir/stmt.h>
+
+#include <algorithm>
+#include <list>
+#include <string>
+#include <vector>
+
+#include "compile_engine.h"
+#include "utils.h"
+
+namespace tvm {
+namespace relay {
+namespace backend {
+
+using IntegerArray = Array<Integer>;
+using ShapeVector = std::vector<std::vector<int64_t>>;
+using GraphAttrs = std::unordered_map<std::string, dmlc::any>;

Review comment:
       seems the particular type is not being used here

##########
File path: include/tvm/tir/builtin.h
##########
@@ -343,6 +343,16 @@ TVM_DLL const Op& tvm_stack_make_array();
  */
 TVM_DLL const Op& tvm_call_packed();
 
+/*!
+ * \brief See pesudo code
+ *
+ *  int tvm_call_packed(fname, TVMValue* args) {
+ *     (*fname)(args, type_code_of(args), len(args));
+ *     return 0;
+ *  }
+ */
+TVM_DLL const Op& tvm_call_cpacked();

Review comment:
       https://github.com/apache/tvm/pull/7932 contains some updated clarifications that we might want to incorporate here

##########
File path: src/relay/backend/aot_executor_codegen.cc
##########
@@ -0,0 +1,674 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+/*!
+ * \file relay/backend/graph_codegen.cc
+ * \brief Graph runtime codegen
+ */
+
+#include <dmlc/any.h>
+#include <tvm/ir/module.h>
+#include <tvm/relay/expr_functor.h>
+#include <tvm/runtime/device_api.h>
+#include <tvm/tir/builtin.h>
+#include <tvm/tir/expr.h>
+#include <tvm/tir/stmt.h>
+
+#include <algorithm>
+#include <list>
+#include <string>
+#include <vector>
+
+#include "compile_engine.h"
+#include "utils.h"
+
+namespace tvm {
+namespace relay {
+namespace backend {
+
+using IntegerArray = Array<Integer>;
+using ShapeVector = std::vector<std::vector<int64_t>>;
+using GraphAttrs = std::unordered_map<std::string, dmlc::any>;

Review comment:
       would be great if we can avoid use of any




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] giuseros commented on pull request #7785: [AOT] Introducing AOT in TVM

Posted by GitBox <gi...@apache.org>.

giuseros commented on pull request #7785:
URL: https://github.com/apache/tvm/pull/7785#issuecomment-827797374


   Hi all, 
   In addition to apply @tqchen comments, I also added:
   * Test for the stack allocator to verify that there is space to store the tags
   * Using `abs(x-ref)` to test AOT for  float data types


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] manupa-arm commented on a change in pull request #7785: [AOT] Introducing AOT in TVM

Posted by GitBox <gi...@apache.org>.

manupa-arm commented on a change in pull request #7785:
URL: https://github.com/apache/tvm/pull/7785#discussion_r608423833



##########
File path: python/tvm/micro/model_library_format.py
##########
@@ -132,14 +135,25 @@ def export_model_library_format(mod: graph_executor_factory.GraphExecutorFactory
         Path to the .tar archive to generate.
     """
     tempdir = utils.tempdir()
+    is_aot = False
+    for v in mod.target.values():
+        if v.attrs.get("executor", "graph_runtime") == "aot":
+            is_aot = True
+            break
+
+    runtime = ["graph"]
+    if is_aot:
+        runtime = ["aot"]
+
     metadata = {
         "version": 1,
         "model_name": mod.libmod_name,
         "export_datetime": datetime.datetime.now().strftime("%Y-%m-%d %H:%M:%SZ"),
-        "memory": _build_memory_map(mod.graph_json),
+        "memory": _build_memory_map(mod.graph),
         "target": {int(k): str(v) for k, v in mod.target.items()},
-        "runtimes": ["graph"],
+        "runtimes": runtime,

Review comment:
       @areusch I think we should deprecate/change the term "runtimes" here as well in MLF.

##########
File path: python/tvm/micro/model_library_format.py
##########
@@ -156,10 +170,11 @@ def export_model_library_format(mod: graph_executor_factory.GraphExecutorFactory
     with open(tempdir.relpath("relay.txt"), "w") as f:
         f.write(str(mod.ir_mod))
 
-    graph_config_dir_path = tempdir.relpath(os.path.join("runtime-config", "graph"))
-    os.makedirs(graph_config_dir_path)
-    with open(os.path.join(graph_config_dir_path, "graph.json"), "w") as f:
-        f.write(mod.graph_json)
+    if not is_aot:

Review comment:
       I think we should just use the factored out function which determines the runtime and moreover I think its better if we encapsulate into a function whether  "mod" requires config data and if so what that would be. e.g., if "graph" it would be the json, etc. Thus it would be extensible when configs needed to be added based on the executor. @areusch @giuseros  WDYT?

##########
File path: python/tvm/micro/model_library_format.py
##########
@@ -132,14 +135,25 @@ def export_model_library_format(mod: graph_executor_factory.GraphExecutorFactory
         Path to the .tar archive to generate.
     """
     tempdir = utils.tempdir()
+    is_aot = False
+    for v in mod.target.values():
+        if v.attrs.get("executor", "graph_runtime") == "aot":
+            is_aot = True
+            break
+
+    runtime = ["graph"]
+    if is_aot:
+        runtime = ["aot"]
+
     metadata = {
         "version": 1,
         "model_name": mod.libmod_name,
         "export_datetime": datetime.datetime.now().strftime("%Y-%m-%d %H:%M:%SZ"),
-        "memory": _build_memory_map(mod.graph_json),
+        "memory": _build_memory_map(mod.graph),

Review comment:
       I think we should gate this for aot executor until the build_memory_map's dependency on the presence of json is resolved.

##########
File path: python/tvm/micro/model_library_format.py
##########
@@ -132,14 +135,25 @@ def export_model_library_format(mod: graph_executor_factory.GraphExecutorFactory
         Path to the .tar archive to generate.
     """
     tempdir = utils.tempdir()
+    is_aot = False
+    for v in mod.target.values():

Review comment:
       I think we should factor this out to a separate function to find the "executor" and rename it as "executor".

##########
File path: python/tvm/micro/model_library_format.py
##########
@@ -86,10 +86,13 @@ def _build_memory_map(graph_json):
     list :
         A list with one entry per storage id describing that memory.
     """
-    graph = json.loads(graph_json)
+    memory_map = []
+    if graph_str.startswith("primfn"):

Review comment:
       See the discussion here as well : https://github.com/apache/tvm/pull/7533.
   
   I think we should not overload the graph here.
   Honestly the model library format should not be affected by which executor is being chosen, however its affected today because it has a reliance on the presence of the "json" as I've said in the discussion above.
   
   Therefore, I think we should fix that and while sorting the tensor pinning infra. Until then, we can leave out creating the memory map in the aot executor because its a limitation of the creation of model library format that has a dependency on the presence of the graph executor. 

##########
File path: python/tvm/relay/build_module.py
##########
@@ -287,7 +289,8 @@ def build(ir_mod, target=None, target_host=None, params=None, mod_name="default"
 
     with tophub_context:
         bld_mod = BuildModule()
-        graph_json, runtime_mod, params = bld_mod.build(mod=ir_mod, target=target, params=params)
+
+        graph, runtime_mod, params = bld_mod.build(mod=ir_mod, target=target, params=params)
         executor_factory = _graph_executor_factory.GraphExecutorFactoryModule(

Review comment:
       Its a bit weird we are using this in the AoT Executor. I have a feeling that we need an AoTExecutorFactorModule that does not have a graph. Ideally, I think we would need an abstract class ExecutorFactorModule that implements the methods that are required by export_model_library. The members such as "graph" should not be present in AoTExecutorFactorModule.

##########
File path: python/tvm/relay/backend/graph_executor_factory.py
##########
@@ -41,17 +41,18 @@ class GraphExecutorFactoryModule:
         The parameters of module
     """
 
-    def __init__(self, ir_mod, target, graph_json_str, libmod, libmod_name, params):

Review comment:
       I think we should not overload this as I've said in the other comments




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] areusch commented on a change in pull request #7785: [AOT] Introducing AOT in TVM

Posted by GitBox <gi...@apache.org>.

areusch commented on a change in pull request #7785:
URL: https://github.com/apache/tvm/pull/7785#discussion_r615011186



##########
File path: src/target/source/codegen_c_host.cc
##########
@@ -40,13 +40,16 @@ namespace codegen {
 
 CodeGenCHost::CodeGenCHost() { module_name_ = GetUniqueName("__tvm_module_ctx"); }
 
-void CodeGenCHost::Init(bool output_ssa, bool emit_asserts, std::string target_str) {
+void CodeGenCHost::Init(bool output_ssa, bool emit_asserts, bool is_aot_executor,
+                        std::string target_str) {

Review comment:
       @giuseros I just realized I forgot one other point:
    - do we need to have aot_backend_api.c now? it would be great to unify with crt_backend_api.cc, and simply not pass `--system-lib` for now with AOT executor. 
   
   It seems like we may be able to get away with this for now, since it wouldn't be called; then we avoid the complexity of multiple _backend_api.c implementations. Then in the future, we can figure out either a compile-time `#define` or another way to avoid linking in the TVMBackendRegisterSystemLibSymbol.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] areusch commented on a change in pull request #7785: [AOT] Introducing AOT in TVM

Posted by GitBox <gi...@apache.org>.

areusch commented on a change in pull request #7785:
URL: https://github.com/apache/tvm/pull/7785#discussion_r615011186



##########
File path: src/target/source/codegen_c_host.cc
##########
@@ -40,13 +40,16 @@ namespace codegen {
 
 CodeGenCHost::CodeGenCHost() { module_name_ = GetUniqueName("__tvm_module_ctx"); }
 
-void CodeGenCHost::Init(bool output_ssa, bool emit_asserts, std::string target_str) {
+void CodeGenCHost::Init(bool output_ssa, bool emit_asserts, bool is_aot_executor,
+                        std::string target_str) {

Review comment:
       @giuseros I just realized I forgot one other point:
    - do we need to have aot_backend_api.c now? it would be great to unify with crt_backend_api.cc, and simply not pass `--system-lib` for now with AOT runtime. 
   
   It seems like we may be able to get away with this for now, since it wouldn't be called; then we avoid the complexity of multiple _backend_api.c implementations. Then in the future, we can figure out either a compile-time `#define` or another way to avoid linking in the TVMBackendRegisterSystemLibSymbol.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] manupa-arm commented on a change in pull request #7785: [AOT] Introducing AOT in TVM

Posted by GitBox <gi...@apache.org>.

manupa-arm commented on a change in pull request #7785:
URL: https://github.com/apache/tvm/pull/7785#discussion_r619384477



##########
File path: src/runtime/crt/memory/stack_allocator.c
##########
@@ -16,17 +16,22 @@
  * specific language governing permissions and limitations
  * under the License.
  */
-
 // LINT_C_FILE
-
 #include <tvm/runtime/crt/stack_allocator.h>
+#ifdef TVM_CRT_DEBUG
+#include <tvm/runtime/crt/logging.h>
+#endif
 
 void* StackMemoryManager_Allocate(tvm_workspace_t* tvm_runtime_workspace, int32_t nbytes) {
   uint32_t offset_bytes = (~nbytes + 1) & (TVM_RUNTIME_ALLOC_ALIGNMENT_BYTES - 1);
   uint8_t* current_alloc = tvm_runtime_workspace->next_alloc;
   uint8_t* next_alloc = tvm_runtime_workspace->next_alloc + nbytes + offset_bytes;
   uint8_t* workspace_end = tvm_runtime_workspace->workspace + tvm_runtime_workspace->workspace_size;
-
+#ifdef TVM_CRT_DEBUG

Review comment:
       I'm not following what you mean by "other code" ? The intention of this to be used by TVMBAW calls generated by TVM codegen only. tvm_runtime_workspace->next_alloc should only be accessed/written by StackMemoryManager_Allocate and Free. 
   
   Are you talking about a scenario where tvm_runtime_workspace is accidentally accessed by other user code ?. (Im trying to figure out a scenario how this error is triggered outside of running generated code)




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] manupa-arm commented on a change in pull request #7785: [AOT] Introducing AOT in TVM

Posted by GitBox <gi...@apache.org>.

manupa-arm commented on a change in pull request #7785:
URL: https://github.com/apache/tvm/pull/7785#discussion_r622140354



##########
File path: src/runtime/crt/memory/stack_allocator.c
##########
@@ -0,0 +1,65 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+// LINT_C_FILE
+#include <tvm/runtime/crt/stack_allocator.h>
+#ifdef TVM_CRT_STACK_ALLOCATOR_ENABLE_FIFO_CHECK
+#include <tvm/runtime/crt/logging.h>
+#endif
+
+void* StackMemoryManager_Allocate(tvm_workspace_t* tvm_runtime_workspace, int32_t nbytes) {
+  // reserve bytes at the end of the allocation such that
+  // next_alloc % TVM_RUNTIME_ALLOC_ALIGNMENT_BYTES == 0.
+  uint32_t offset_bytes =
+      (TVM_RUNTIME_ALLOC_ALIGNMENT_BYTES - nbytes) & (TVM_RUNTIME_ALLOC_ALIGNMENT_BYTES - 1);
+  uint8_t* current_alloc = tvm_runtime_workspace->next_alloc;
+  uint8_t* next_alloc = tvm_runtime_workspace->next_alloc + nbytes + offset_bytes;
+  uint8_t* workspace_end = tvm_runtime_workspace->workspace + tvm_runtime_workspace->workspace_size;
+#ifdef TVM_CRT_STACK_ALLOCATOR_ENABLE_FIFO_CHECK
+  if (next_alloc + STACK_ALLOCATOR_TAG_SIZE_BYTES > workspace_end) {
+    return NULL;
+  }
+  const uint32_t total_size = (nbytes + offset_bytes + STACK_ALLOCATOR_TAG_SIZE_BYTES);
+  *((uint32_t*)next_alloc) = total_size ^ STACK_ALLOCATOR_TAG;
+  next_alloc += STACK_ALLOCATOR_TAG_SIZE_BYTES;
+#endif
+  if (next_alloc > workspace_end) {
+    return NULL;
+  }
+
+  tvm_runtime_workspace->next_alloc = next_alloc;
+  return current_alloc;

Review comment:
       This should be an argument to the function to be populated here rather than the return value.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] manupa-arm commented on a change in pull request #7785: [AOT] Introducing AOT in TVM

Posted by GitBox <gi...@apache.org>.

manupa-arm commented on a change in pull request #7785:
URL: https://github.com/apache/tvm/pull/7785#discussion_r619372859



##########
File path: src/runtime/crt/memory/stack_allocator.c
##########
@@ -16,17 +16,22 @@
  * specific language governing permissions and limitations
  * under the License.
  */
-
 // LINT_C_FILE
-
 #include <tvm/runtime/crt/stack_allocator.h>
+#ifdef TVM_CRT_DEBUG
+#include <tvm/runtime/crt/logging.h>
+#endif
 
 void* StackMemoryManager_Allocate(tvm_workspace_t* tvm_runtime_workspace, int32_t nbytes) {
   uint32_t offset_bytes = (~nbytes + 1) & (TVM_RUNTIME_ALLOC_ALIGNMENT_BYTES - 1);
   uint8_t* current_alloc = tvm_runtime_workspace->next_alloc;
   uint8_t* next_alloc = tvm_runtime_workspace->next_alloc + nbytes + offset_bytes;
   uint8_t* workspace_end = tvm_runtime_workspace->workspace + tvm_runtime_workspace->workspace_size;
-
+#ifdef TVM_CRT_DEBUG

Review comment:
       I agree if illegal code is "generated", it will result in garbage calculation or exceeding the provided buffer for allocator if we are using this to handle workspace data.
   
   Even this gives an error (of illegal free) that should mean a bug in the codegen (a very unlikely one if everything is lowered from TIR). In which case, the codegen developer needs to fix. Therefore, I do think the developer should investigate with DEBUG mode to see if this is causing the error and fix the codegen. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] areusch commented on a change in pull request #7785: [AOT] Introducing AOT in TVM

Posted by GitBox <gi...@apache.org>.

areusch commented on a change in pull request #7785:
URL: https://github.com/apache/tvm/pull/7785#discussion_r619364867



##########
File path: src/runtime/crt/memory/stack_allocator.c
##########
@@ -16,17 +16,22 @@
  * specific language governing permissions and limitations
  * under the License.
  */
-
 // LINT_C_FILE
-
 #include <tvm/runtime/crt/stack_allocator.h>
+#ifdef TVM_CRT_DEBUG
+#include <tvm/runtime/crt/logging.h>
+#endif
 
 void* StackMemoryManager_Allocate(tvm_workspace_t* tvm_runtime_workspace, int32_t nbytes) {
   uint32_t offset_bytes = (~nbytes + 1) & (TVM_RUNTIME_ALLOC_ALIGNMENT_BYTES - 1);
   uint8_t* current_alloc = tvm_runtime_workspace->next_alloc;
   uint8_t* next_alloc = tvm_runtime_workspace->next_alloc + nbytes + offset_bytes;
   uint8_t* workspace_end = tvm_runtime_workspace->workspace + tvm_runtime_workspace->workspace_size;
-
+#ifdef TVM_CRT_DEBUG

Review comment:
       i would push back with: in a deployment scenario, what `$pc` do you want logged when there is memory corruption?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] areusch commented on a change in pull request #7785: [AOT] Introducing AOT in TVM

Posted by GitBox <gi...@apache.org>.

areusch commented on a change in pull request #7785:
URL: https://github.com/apache/tvm/pull/7785#discussion_r608904060



##########
File path: python/tvm/micro/model_library_format.py
##########
@@ -156,10 +170,11 @@ def export_model_library_format(mod: graph_executor_factory.GraphExecutorFactory
     with open(tempdir.relpath("relay.txt"), "w") as f:
         f.write(str(mod.ir_mod))
 
-    graph_config_dir_path = tempdir.relpath(os.path.join("runtime-config", "graph"))
-    os.makedirs(graph_config_dir_path)
-    with open(os.path.join(graph_config_dir_path, "graph.json"), "w") as f:
-        f.write(mod.graph_json)
+    if not is_aot:

Review comment:
       @manupa-arm just to clarify, do you mean factor out this logic into a graph-specific function and invoke based on the executor type? if so I agree w/ that. I'm not convinced AOT will always require 0 configuration, particularly given conversation in https://discuss.tvm.apache.org/t/mapping-tensorir-te-to-heterogenous-systems/9617




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] giuseros commented on a change in pull request #7785: [AOT] Introducing AOT in TVM

Posted by GitBox <gi...@apache.org>.

giuseros commented on a change in pull request #7785:
URL: https://github.com/apache/tvm/pull/7785#discussion_r608185624



##########
File path: src/relay/backend/aot_codegen.cc
##########
@@ -0,0 +1,675 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+/*!
+ * \file relay/backend/graph_codegen.cc
+ * \brief Graph runtime codegen
+ */
+
+#include <dmlc/any.h>
+#include <tvm/ir/module.h>
+#include <tvm/relay/expr_functor.h>
+#include <tvm/runtime/device_api.h>
+#include <tvm/tir/builtin.h>
+#include <tvm/tir/expr.h>
+#include <tvm/tir/stmt.h>
+
+#include <algorithm>
+#include <list>
+#include <string>
+#include <vector>
+
+#include "../../runtime/meta_data.h"
+#include "compile_engine.h"
+#include "utils.h"
+
+namespace tvm {
+namespace relay {
+namespace backend {
+
+using IntegerArray = Array<Integer>;
+using ShapeVector = std::vector<std::vector<int64_t>>;
+using GraphAttrs = std::unordered_map<std::string, dmlc::any>;
+using TargetsMap = std::unordered_map<int, Target>;
+
+/*! \brief Lowered outputs */
+struct AOTLoweredOutput {
+  std::string graph_tir;
+  Map<String, IRModule> lowered_funcs;
+  Array<tvm::runtime::Module> external_mods;
+  std::unordered_map<std::string, std::pair<int, const tvm::runtime::NDArray>> params;
+  runtime::AOTMetadata aot_metadata;
+};
+
+class AotReturnSidVisitor : public ExprVisitor {
+ public:
+  explicit AotReturnSidVisitor(Map<Expr, Array<IntegerArray>> storage_device_map)
+      : storage_device_map_{storage_device_map}, return_sid_{-1} {}
+
+  IntegerArray FindReturnSid(Function func) {
+    VisitExpr(func->body);
+    return return_sid_;
+  }
+
+ protected:
+  void AssignReturnSid(Expr e) {
+    auto iter = storage_device_map_.find(e);
+    if (iter != storage_device_map_.end()) {
+      return_sid_ = (*iter).second[0];
+    }
+  }
+
+  void VisitExpr_(const ConstantNode* cn) override {
+    ExprVisitor::VisitExpr_(cn);
+    AssignReturnSid(GetRef<Expr>(cn));
+  }
+
+  void VisitExpr_(const VarNode* vn) override {
+    ExprVisitor::VisitExpr_(vn);
+    AssignReturnSid(GetRef<Expr>(vn));
+  }
+
+  void VisitExpr_(const CallNode* cn) override {
+    ExprVisitor::VisitExpr_(cn);
+    AssignReturnSid(GetRef<Expr>(cn));
+  }
+
+  void VisitExpr_(const LetNode* op) override { VisitExpr(op->body); }
+
+  void VisitExpr_(const TupleNode* tn) override {
+    ExprVisitor::VisitExpr_(tn);
+    AssignReturnSid(GetRef<Expr>(tn));
+  }
+
+ private:
+  Map<Expr, Array<IntegerArray>> storage_device_map_;
+  IntegerArray return_sid_;
+};
+
+using TIRNetwork = tvm::Array<tir::Stmt>;
+
+/*! \brief Code generator for graph runtime */
+class AOTCodegen : public ExprVisitor {
+ protected:
+  /*!
+   * \brief Utility function to allocate a DLTensor or TVMValue
+   * \param  type the type of allocation
+   * \param num the number of variable to allocate on the stack
+   * \return PrimExpr representing the allocated object
+   */
+  PrimExpr StackAlloca(std::string type, size_t num) {
+    Array<PrimExpr> args = {tir::StringImm(type), ConstInt32(num)};
+    return tir::Call(DataType::Handle(), tir::builtin::tvm_stack_alloca(), args);
+  }
+
+  /*!
+   * \brief Utility function to allocate memory for storage identifiers
+   * \param  memory_size_byte size in bytes of the allocation
+   * \return PrimExpr representing the allocated memory
+   */
+  PrimExpr AllocateBackendMemory(int memory_size_byte) {
+    // TODO(giuseros): use tir::Allocate instead of TVMBackendAllocWorkspace
+    // to enable unified memory planning
+    static const Op& op = Op::Get("tir.TVMBackendAllocWorkspace");
+    return tvm::tir::Call(DataType::Handle(), op, {1, 0, memory_size_byte, 2, 8});
+  }
+
+  /*!
+   * \brief Utility function to convert a concrete integer to a PrimExpr.
+   * \param num the number to convert
+   * \return PrimExpr representing num
+   */
+  inline PrimExpr ConstInt32(size_t num) {
+    ICHECK_LE(num, std::numeric_limits<int>::max());
+    return tir::make_const(DataType::Int(32), static_cast<int>(num));
+  }
+
+  /*!
+   * \brief Return a vector of variables that represents the sids for the given Relay Expr
+   */
+  std::vector<tir::Var> pack_sid(Expr expr) {
+    Array<IntegerArray> sids = storage_device_map_[expr];
+    std::vector<tir::Var> sid_vars;
+
+    // Note that an expression can have multiple sids associated with it
+    // e.g., returning multiple values from a function
+    for (const auto& sid : sids[0]) {
+      // Determine if an sid is an output buffer
+      int sid_int = static_cast<int>((sid.as<IntImmNode>())->value);
+      auto output_iter = std::find(return_sid_.begin(), return_sid_.end(), sid_int);
+      if (output_iter != return_sid_.end()) {
+        int output_index = std::distance(return_sid_.begin(), output_iter);
+        sid_vars.push_back(main_signature_[input_vars_.size() + output_index]);
+        continue;
+      }
+      // Pack the sid inside the TVMValue
+      auto sid_array = te::Var(make_string("sid_", sid, "_value"), DataType::Handle());
+      auto sid_value = sids_table_[sid];
+      tvm::PrimExpr set_tensor =
+          tvm::tir::Call(DataType::Handle(), tvm::tir::builtin::tvm_struct_set(),
+                         {sid_array, 0, tir::builtin::kArrData, sid_value});
+      stmts_.push_back(tir::LetStmt(sid_array, StackAlloca("array", 1), tir::Evaluate(set_tensor)));
+      sid_vars.push_back(sid_array);
+    }
+    return sid_vars;
+  }
+
+  /*!
+   * \brief Utility function to return a parameter associated with an expression
+   * \param expr Relay Expression assicated with the parameter
+   * \return Variable that represents the DLTensor associated with the parameters
+   */
+  tir::Var pack_param(Expr expr) {
+    // TODO(giuseros): Using call_extern to call into lookup_linked_param. This is because the
+    // builtin::ret is not supported yet in the c target. Once return is supported we can use
+    // tvm_call_packed_lowered().
+    int param_sid = param_storage_ids_[reverse_params_lookup_[expr]];
+    auto lookup_linked_param_fn = tir::StringImm(::tvm::runtime::symbol::tvm_lookup_linked_param);
+    auto param_array = te::Var(make_string("param_", param_sid, "_array"), DataType::Handle());
+
+    // Compose the lookup_call using a local stack
+    Array<tir::Stmt> lookup_call;
+    auto param_var = te::Var(make_string("param_", param_sid, "_value"), DataType::Handle());
+    auto ret_var = te::Var("ret_value", DataType::Handle());
+    auto ret_code = te::Var("ret_value", DataType::Handle());
+
+    lookup_call.push_back(tir::Evaluate(
+        tvm::tir::Call(DataType::Handle(), tvm::tir::builtin::tvm_struct_set(),
+                       {param_var, 0, tir::builtin::kTVMValueContent, ConstInt32(param_sid)})));
+    lookup_call.push_back(tir::Evaluate(
+        tvm::tir::Call(DataType::Handle(), tir::builtin::call_extern(),
+                       {lookup_linked_param_fn, param_var, 0, 0, ret_var, ret_code, 0})));
+    auto ret_var_handle = tvm::tir::Call(DataType::Handle(), tvm::tir::builtin::tvm_struct_get(),
+                                         {ret_var, 0, tir::builtin::kTVMValueContent});
+
+    // Set the param to the value returned by lookup_call
+    tvm::PrimExpr set_param_array =
+        tvm::tir::Call(DataType::Handle(), tvm::tir::builtin::tvm_struct_set(),
+                       {param_array, 0, tir::builtin::kArrData, ret_var_handle});
+    lookup_call.push_back(tir::Evaluate(set_param_array));
+
+    tir::Stmt lookup_body = tir::SeqStmt(lookup_call);
+
+    // Allocate the DLTensors on the stack
+    lookup_body = tir::LetStmt(param_var, StackAlloca("arg_value", 1), lookup_body);
+    lookup_body = tir::LetStmt(ret_var, StackAlloca("arg_value", 1), lookup_body);
+    lookup_body = tir::LetStmt(ret_code, StackAlloca("arg_value", 1), lookup_body);
+    lookup_body = tir::LetStmt(param_array, StackAlloca("arg_value", 1), lookup_body);
+    stmts_.push_back(lookup_body);
+    return param_array;
+  }
+
+  /*!
+   * brief Given an expression return the variable(s) associated with that expression
+   */
+  std::vector<te::Var> find_expr(Expr arg) {
+    auto input_iter = std::find(input_vars_.begin(), input_vars_.end(), arg);
+    if (input_iter != input_vars_.end()) {
+      // Input variable
+      int main_index = std::distance(input_vars_.begin(), input_iter);
+      return {main_signature_[main_index]};
+    } else if (reverse_params_lookup_.find(arg) != reverse_params_lookup_.end()) {
+      // Parameter of the network
+      return {pack_param(arg)};
+    } else {
+      // Storage identifier (i.e., intermediate memory)
+      return pack_sid(arg);
+    }
+  }
+
+  /*!
+   * brief Call a function with a given name
+   */
+  void func_call(Call call, std::string func_name) {
+    tvm::Array<PrimExpr> args{tvm::tir::StringImm(func_name)};
+    std::vector<tir::Stmt> func_call_stmts;
+
+    // Pack the inputs
+    for (Expr arg : call->args) {
+      auto var_arg = find_expr(arg);
+      args.push_back(var_arg[0]);
+    }
+
+    auto ret_expr = Downcast<Expr>(call);
+
+    // Pack the return(s) value. A call node can produce multiple outputs
+    for (const auto& var : pack_sid(ret_expr)) {
+      args.push_back(var);
+    }
+
+    // Use tvm_call_packed to execute the function
+    func_call_stmts.push_back(tir::Evaluate(
+        tvm::tir::Call(DataType::Int(32), tvm::tir::builtin::tvm_call_packed(), args)));
+    tir::Stmt body = tir::SeqStmt(func_call_stmts);
+    stmts_.push_back(body);
+  }
+
+  /*!
+   * brief Copy a variable to the output. This function is mainly used in edge cases
+   * when we want to return an input or a parameter.
+   */
+  void copy_to_output(te::Var out, te::Var in, size_t size) {
+    auto retval_get = tvm::tir::Call(DataType::Handle(), tvm::tir::builtin::tvm_struct_get(),
+                                     {in, 0, tir::builtin::kArrData});
+
+    // Define intermediate DLTensor to load/store the data
+    auto tmp0 = te::Var("tmp0", DataType::Handle());
+    auto tmp1 = te::Var("tmp1", DataType::Handle());
+    te::Var loop_idx("i", DataType::Int(32));
+    auto retval_i = tir::Load(DataType::UInt(8), tmp0, loop_idx, tir::const_true());
+    auto tostore = tvm::tir::Call(DataType::Handle(), tvm::tir::builtin::tvm_struct_get(),
+                                  {out, 0, tir::builtin::kArrData});
+
+    // Copy the variable from the input to the output
+    tir::Stmt copy = tir::For(
+        loop_idx, 0, ConstInt32(size), tir::ForKind::kSerial,
+        tir::Store(tmp1, tir::Let(tmp0, retval_get, retval_i), loop_idx, tir::const_true()));
+    stmts_.push_back(tir::LetStmt(tmp1, tostore, copy));
+  }
+
+  /*!
+   * Utility function to string together different arguments
+   */
+  template <typename... Args>
+  std::string make_string(Args const&... args) {
+    std::ostringstream ss;
+    using List = int[];
+    (void)List{0, ((void)(ss << args), 0)...};
+
+    return ss.str();
+  }
+
+  void VisitExpr_(const CallNode* op) override {
+    // Descend the call tree
+    for (auto arg : op->args) {
+      VisitExpr(arg);
+    }
+
+    Expr expr = GetRef<Expr>(op);
+    Function func;
+    if (op->op.as<OpNode>()) {
+      LOG(FATAL) << "Operators should be transformed away; try applying"
+                 << "the fuse_ops transformation to the expression.";
+    } else if (op->op.as<GlobalVarNode>()) {
+      LOG(FATAL) << "Not implemented";
+    } else if (op->op.as<FunctionNode>()) {
+      func = GetRef<Function>(op->op.as<FunctionNode>());
+    } else {
+      LOG(FATAL) << "TVM runtime does not support calls to " << op->op->GetTypeKey();
+    }
+    if (!func->HasNonzeroAttr(attr::kPrimitive)) {
+      LOG(FATAL) << "TVM only support calls to primitive functions "
+                 << "(i.e functions composed of fusable operator invocations)";
+    }
+
+    auto pf0 = GetPackedFunc("relay.backend._make_CCacheKey");
+    auto pf1 = GetPackedFunc("relay.backend._CompileEngineLower");
+    Target target;
+    // Handle external function
+    if (func->GetAttr<String>(attr::kCompiler).defined()) {
+      target = Target("ext_dev");
+      CCacheKey key = (*pf0)(func, target);
+      CachedFunc ext_func = (*pf1)(compile_engine_, key);
+      ICHECK(ext_func.defined()) << "External function is not defined.";
+      UpdateConstants(func, &params_);
+
+      // Generate the TIR function call
+      func_call(GetRef<Call>(op), ext_func->func_name);
+    }
+
+    ICHECK_GE(storage_device_map_.count(expr), 0);
+    auto& device_type = storage_device_map_[expr][1];
+    auto call_dev_type = device_type[0]->value;
+    // Normal Relay Function
+    if (targets_.size() == 1) {
+      // homogeneous execution.
+      const auto& it = targets_.begin();
+      target = (*it).second;
+    } else {
+      // heterogeneous execution.
+      std::string call_dev_name;
+      if (call_dev_type == 0) {
+        call_dev_name = "llvm";
+      } else {
+        call_dev_name = runtime::DeviceName(call_dev_type);
+      }
+      if (targets_.count(call_dev_type) == 0) {
+        LOG(FATAL) << "No target is provided for device " << call_dev_name;
+      }
+      target = targets_[call_dev_type];
+    }
+    CCacheKey key = (*pf0)(func, target);
+    CachedFunc lowered_func = (*pf1)(compile_engine_, key);
+    if (!lowered_funcs_.count(target->str())) {
+      lowered_funcs_[target->str()] = IRModule(Map<GlobalVar, BaseFunc>({}));
+    }
+    lowered_funcs_[target->str()]->Update(lowered_func->funcs);
+
+    // Generate the TIR function call
+    func_call(GetRef<Call>(op), lowered_func->func_name);
+  }
+
+  void VisitExpr_(const VarNode* op) override {
+    Expr expr = GetRef<Expr>(op);
+
+    // If the Var node is an output node we need to copy the content of the variable to the output
+    // A Var node can only produce a single output

Review comment:
       I am happy to add this comment, if you could add a bit more details because I am not sure I follow :) So here the idea is that this case only happens when we are assigning an input of the network to an output of the network. So I am basically verifying this case and copy the input to the output node. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] giuseros commented on a change in pull request #7785: [AOT] Introducing AOT in TVM

Posted by GitBox <gi...@apache.org>.

giuseros commented on a change in pull request #7785:
URL: https://github.com/apache/tvm/pull/7785#discussion_r608193957



##########
File path: src/runtime/crt/aot/tvm_executor.c
##########
@@ -0,0 +1,91 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+// LINT_C_FILE
+
+/*!
+ * \file src/runtime/crt/aot/tvm_executor.c
+ * \brief Internal implementation of the AOT Executor
+ */
+
+#include "tvm_executor.h"
+
+#include <dlpack/dlpack.h>
+
+#include "tvm_backend.h"
+#include "tvm_error.h"
+
+tvm_workspace_t* tvm_runtime_workspace;
+
+tvm_crt_error_t tvm_runtime_run(const tvm_model_t* model, void** inputs, void** outputs,
+                                tvm_context_t* context) {
+  static DLContext fake_ctx = {kDLCPU, 0};
+  static int64_t fake_dims = 0;
+  static int64_t fake_shape = {0};
+
+  DLTensor tensors[model->num_input_tensors + model->num_output_tensors];     // NOLINT

Review comment:
       The problem here I believe is that the linter complains that we are allocating non-const arrays on the stack




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] giuseros commented on a change in pull request #7785: [AOT] Introducing AOT in TVM

Posted by GitBox <gi...@apache.org>.

giuseros commented on a change in pull request #7785:
URL: https://github.com/apache/tvm/pull/7785#discussion_r608176669



##########
File path: python/tvm/relay/backend/graph_executor_factory.py
##########
@@ -41,17 +41,18 @@ class GraphExecutorFactoryModule:
         The parameters of module
     """
 
-    def __init__(self, ir_mod, target, graph_json_str, libmod, libmod_name, params):

Review comment:
       So, this is the same as before. The "string" returned by `tvm.relay.build` is not necessarily a JSON file. It can be also a PrimFunc. That's why I am moving `graph_json_str` to be only `graph_str`




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] manupa-arm commented on a change in pull request #7785: [AOT] Introducing AOT in TVM

Posted by GitBox <gi...@apache.org>.

manupa-arm commented on a change in pull request #7785:
URL: https://github.com/apache/tvm/pull/7785#discussion_r622139199



##########
File path: src/runtime/crt/memory/stack_allocator.c
##########
@@ -0,0 +1,65 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+// LINT_C_FILE
+#include <tvm/runtime/crt/stack_allocator.h>
+#ifdef TVM_CRT_STACK_ALLOCATOR_ENABLE_FIFO_CHECK
+#include <tvm/runtime/crt/logging.h>
+#endif
+
+void* StackMemoryManager_Allocate(tvm_workspace_t* tvm_runtime_workspace, int32_t nbytes) {

Review comment:
       I think this should return tvm_crt_error_t




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] giuseros commented on pull request #7785: [AOT] Introducing AOT in TVM

Posted by GitBox <gi...@apache.org>.

giuseros commented on pull request #7785:
URL: https://github.com/apache/tvm/pull/7785#issuecomment-825992689


   Hi @areusch , 
   I reworked most of the comments except the ones concerning the stack allocator where the discussion is still going on


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] giuseros commented on a change in pull request #7785: [AOT] Introducing AOT in TVM

Posted by GitBox <gi...@apache.org>.

giuseros commented on a change in pull request #7785:
URL: https://github.com/apache/tvm/pull/7785#discussion_r608125123



##########
File path: include/tvm/runtime/module.h
##########
@@ -230,6 +230,8 @@ constexpr const char* tvm_module_main = "__tvm_main__";
 constexpr const char* tvm_param_prefix = "__tvm_param__";
 /*! \brief A PackedFunc that looks up linked parameters by storage_id. */
 constexpr const char* tvm_lookup_linked_param = "_lookup_linked_param";
+/*! \brief The main AOT executor function */
+constexpr const char* tvm_run_func_prefix = "tvm__run_func";

Review comment:
       So you mean let's decide in a single (simpler) PR what the value of `tvm_run_func_prefix` should be? Fine by me :) 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] areusch commented on pull request #7785: [AOT] Introducing AOT in TVM

Posted by GitBox <gi...@apache.org>.

areusch commented on pull request #7785:
URL: https://github.com/apache/tvm/pull/7785#issuecomment-820607337


   @giuseros sorry for the delay, lots to review and stuff. I commented on our codegen thread, let's agree on a path forward for this PR, then we can launch some separate discussions about the pieces we're deferring.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] giuseros commented on a change in pull request #7785: [AOT] Introducing AOT in TVM

Posted by GitBox <gi...@apache.org>.

giuseros commented on a change in pull request #7785:
URL: https://github.com/apache/tvm/pull/7785#discussion_r620355300



##########
File path: src/runtime/crt/memory/stack_allocator.c
##########
@@ -16,17 +16,22 @@
  * specific language governing permissions and limitations
  * under the License.
  */
-
 // LINT_C_FILE
-
 #include <tvm/runtime/crt/stack_allocator.h>
+#ifdef TVM_CRT_DEBUG
+#include <tvm/runtime/crt/logging.h>
+#endif
 
 void* StackMemoryManager_Allocate(tvm_workspace_t* tvm_runtime_workspace, int32_t nbytes) {
   uint32_t offset_bytes = (~nbytes + 1) & (TVM_RUNTIME_ALLOC_ALIGNMENT_BYTES - 1);
   uint8_t* current_alloc = tvm_runtime_workspace->next_alloc;
   uint8_t* next_alloc = tvm_runtime_workspace->next_alloc + nbytes + offset_bytes;
   uint8_t* workspace_end = tvm_runtime_workspace->workspace + tvm_runtime_workspace->workspace_size;
-
+#ifdef TVM_CRT_DEBUG

Review comment:
       Yes, I agree on the approach! I just pushed the new tests (with the check enabled). The name of the flag I picked is `TVM_CRT_STACK_ALLOCATOR_ENABLE_FIFO_CHECK`, please let me know if you agree with it, or if you have a better name (naming is not my strong skill :) )




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] areusch commented on a change in pull request #7785: [AOT] Introducing AOT in TVM

Posted by GitBox <gi...@apache.org>.

areusch commented on a change in pull request #7785:
URL: https://github.com/apache/tvm/pull/7785#discussion_r619380889



##########
File path: src/runtime/crt/memory/stack_allocator.c
##########
@@ -16,17 +16,22 @@
  * specific language governing permissions and limitations
  * under the License.
  */
-
 // LINT_C_FILE
-
 #include <tvm/runtime/crt/stack_allocator.h>
+#ifdef TVM_CRT_DEBUG
+#include <tvm/runtime/crt/logging.h>
+#endif
 
 void* StackMemoryManager_Allocate(tvm_workspace_t* tvm_runtime_workspace, int32_t nbytes) {
   uint32_t offset_bytes = (~nbytes + 1) & (TVM_RUNTIME_ALLOC_ALIGNMENT_BYTES - 1);
   uint8_t* current_alloc = tvm_runtime_workspace->next_alloc;
   uint8_t* next_alloc = tvm_runtime_workspace->next_alloc + nbytes + offset_bytes;
   uint8_t* workspace_end = tvm_runtime_workspace->workspace + tvm_runtime_workspace->workspace_size;
-
+#ifdef TVM_CRT_DEBUG

Review comment:
       the case i'm thinking about is where other code running on the user device overwrites e.g. `next_alloc` or the `ptr` passed to Free(). in this case, the error is likely in the user application rather than in TVM codegen, and this allocator would be a bit of a black box (of course, they could inspect the source code, but it would be helpful to provide an indication of memory corruption at the earliest point it could be detected). 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] giuseros edited a comment on pull request #7785: [AOT] Introducing AOT in TVM

Posted by GitBox <gi...@apache.org>.

giuseros edited a comment on pull request #7785:
URL: https://github.com/apache/tvm/pull/7785#issuecomment-821404672


   Hi @areusch , 
   Thanks for you comments. I left some questions where I didn't get exactly what you meant (sorry :) ). The major things I will do next week is:
   * Introduce a new intrinsic to support the `cpacked_func` call 
   * Move the `aot_executor.h` in and internal folder, so that is not a public interface and update the `aot_test.mk` make file


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] manupa-arm commented on a change in pull request #7785: [AOT] Introducing AOT in TVM

Posted by GitBox <gi...@apache.org>.

manupa-arm commented on a change in pull request #7785:
URL: https://github.com/apache/tvm/pull/7785#discussion_r619408979



##########
File path: src/runtime/crt/memory/stack_allocator.c
##########
@@ -16,17 +16,22 @@
  * specific language governing permissions and limitations
  * under the License.
  */
-
 // LINT_C_FILE
-
 #include <tvm/runtime/crt/stack_allocator.h>
+#ifdef TVM_CRT_DEBUG
+#include <tvm/runtime/crt/logging.h>
+#endif
 
 void* StackMemoryManager_Allocate(tvm_workspace_t* tvm_runtime_workspace, int32_t nbytes) {
   uint32_t offset_bytes = (~nbytes + 1) & (TVM_RUNTIME_ALLOC_ALIGNMENT_BYTES - 1);
   uint8_t* current_alloc = tvm_runtime_workspace->next_alloc;
   uint8_t* next_alloc = tvm_runtime_workspace->next_alloc + nbytes + offset_bytes;
   uint8_t* workspace_end = tvm_runtime_workspace->workspace + tvm_runtime_workspace->workspace_size;
-
+#ifdef TVM_CRT_DEBUG

Review comment:
       I get the negative consequences (therefore agree that we need a check) but not sure how realistic are they to be on by default. If we really want to have them by default -- @giuseros I'd suggest we'd need a seperate buffer to hold debug info not interleaved with the data buffer.
   
   We can always compile with -D STACK_ALLOCATOR_CHECK_ENABLED to unit test, is there a concern of not being able to unit test because of its not on by default.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] giuseros commented on a change in pull request #7785: [AOT] Introducing AOT in TVM

Posted by GitBox <gi...@apache.org>.

giuseros commented on a change in pull request #7785:
URL: https://github.com/apache/tvm/pull/7785#discussion_r612602136



##########
File path: src/target/source/codegen_c_host.cc
##########
@@ -40,13 +40,16 @@ namespace codegen {
 
 CodeGenCHost::CodeGenCHost() { module_name_ = GetUniqueName("__tvm_module_ctx"); }
 
-void CodeGenCHost::Init(bool output_ssa, bool emit_asserts, std::string target_str) {
+void CodeGenCHost::Init(bool output_ssa, bool emit_asserts, bool is_aot_executor,
+                        std::string target_str) {

Review comment:
       > it might be harder for us to handle device functions that does not directly corresponds to a symbol
   
   Could you give an example for such function? 
   
   > we should clarify and document what is the CRT standard so that the new code generator can refer to
   
   So, about this, for me the "runtime" of the graph is a triplet of:
   * Calling convention (registry, dynamic or plain)
   * Backend functions (platform specific vs defined in TVM)
   * Graph execution (aot vs graph)
   
   As of now, CRT (i.e., `--runtime=c`)  means: `{registry_calling_convention,platform_specific_backend_API, graph_executor}`. On the other hand, AOT (i.e., `--runtime=c --executor=aot`)  for now means: `{plain_calling_convention, platform_specific_backend, aot_executor}`. I agree it would be nice to allow all sort of combinations, but for instance:
   
   * The `registry_calling_convention` is a requirement of the `graph_executor`
   * The `dynamic_calling_convention` might be quite hard to implement in AOT 
   * The `graph_executor` is required (for now) for things like `rpc`
   
   So, I am not sure we can come up in day1 with a full general aot integration. I would say :
   
   * Let's stick with the current AOT definition for now (which is for sure more general that we had in mind at the beginning :) )
   * I will remove the `is_aot` from the codegen, but I will use a more generic `use_plain_calling_convention`
   * Let's document the exact definition of CRT and AOT and let's try to generalize later
   
   What do you guys think? Also adding @manupa-arm and @mbaret into the discussion




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] giuseros commented on a change in pull request #7785: [AOT] Introducing AOT in TVM

Posted by GitBox <gi...@apache.org>.

giuseros commented on a change in pull request #7785:
URL: https://github.com/apache/tvm/pull/7785#discussion_r614807239



##########
File path: src/target/source/codegen_c_host.cc
##########
@@ -40,13 +40,16 @@ namespace codegen {
 
 CodeGenCHost::CodeGenCHost() { module_name_ = GetUniqueName("__tvm_module_ctx"); }
 
-void CodeGenCHost::Init(bool output_ssa, bool emit_asserts, std::string target_str) {
+void CodeGenCHost::Init(bool output_ssa, bool emit_asserts, bool is_aot_executor,
+                        std::string target_str) {

Review comment:
       Hi @areusch ,
   As I said on the discuss post, I agree about having a separate discussion on Memory management, PackedFunc and firmware-facing API. I would say that none of this is covered in this PR. In this PR, we have:
   * Relay->TIR codegen (this is the main part)
   * Stack allocator (this is only an add-on, which can be exposed through the usual PlatformAllocator, as for the normal paged allocator)
   * Got rid of function registry when calling the functions in AOT
   * added a minimalistic tvm_runtime_run function in aot_executor.h that calls the PackedFunc contained in the tvm_model_t struct. 
   The only way I affect the CRT "interface" in this PR is by adding a new implementation of `c_backend_api.h` in `aot_backend_api.c` to get rid of `TVMFuncRegisterGlobal` which we don't use anymore. If you wish, I can use `crt_backend_api.c` and add a fake implementation of `TVMFuncRegisterGlobal` to work around the undefined symbol




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org