You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@tvm.apache.org by GitBox <gi...@apache.org> on 2021/05/20 15:57:08 UTC

[GitHub] [tvm] giuseros opened a new pull request #8096: Decoupling AOT from graph memory planner

giuseros opened a new pull request #8096:
URL: https://github.com/apache/tvm/pull/8096


   In this PR we are decoupling AOT from the Graph Memory Planner. Since AOT has the runner expressed in TIR we can get rid of the GMP in relay and use the Storage Rewrite Pass to do memory planning on the runner function. This also sorts out the issue mentioned in #8062


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] manupa-arm commented on a change in pull request #8096: Decoupling AOT from graph memory planner

Posted by GitBox <gi...@apache.org>.

manupa-arm commented on a change in pull request #8096:
URL: https://github.com/apache/tvm/pull/8096#discussion_r659942293



##########
File path: src/tir/transforms/legalize_packed_calls.cc
##########
@@ -60,30 +60,41 @@ class PackedCallLegalizer : public StmtExprMutator {
     if (call) {
       if (call->op.same_as(builtin::tvm_call_cpacked())) {
         Array<PrimExpr> packed_args{call->args[0]};
+        std::vector<tir::Var> tvm_values;
         for (unsigned i = 1; i < call->args.size(); i++) {
           // No need to pack inputs of the prim_func
           if (inputs_[call->args[i]] == true) {
             packed_args.push_back(call->args[i]);
           } else {
             // Pack the argument inside a TVMValue
-            auto sid_array = tir::Var("tvm_value", DataType::Handle());
-            tir::Stmt set_struct_stmt = tir::Evaluate(
+            std::stringstream ss;
+            ss << "tvm_value_" << tvm_value_index_++;
+            auto sid_array = tir::Var(ss.str(), DataType::Handle());
+            tvm_values.push_back(sid_array);
+
+            new_stmts.push_back(tir::Evaluate(
                 tvm::tir::Call(DataType::Handle(), tvm::tir::builtin::tvm_struct_set(),
-                               {sid_array, 0, tir::builtin::kArrData, call->args[i]}));
-            new_stmts.push_back(LetStmt(sid_array, StackAlloca("array", 1), set_struct_stmt));
+                               {sid_array, 0, tir::builtin::kArrData, call->args[i]})));
             packed_args.push_back(sid_array);
           }
         }
-        // Finally, evaluate the packed call and return a sequential statement
+        // Evaluate the packed call
         new_stmts.push_back(tir::Evaluate(tir::Call(call->dtype, call->op, packed_args)));
-        return tir::SeqStmt(new_stmts);
+        tir::Stmt call_stmt = tir::SeqStmt(new_stmts);
+
+        // Allocate the TVMValues on the stack and define the variables
+        for (auto v : tvm_values) {
+          call_stmt = LetStmt(v, StackAlloca("array", 1), call_stmt);

Review comment:
       yes, that sounds good! (not sure we want a seperate pool but I can see them pooled to 'a'  workspace buffer)




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] giuseros commented on pull request #8096: Decoupling AOT from graph memory planner

Posted by GitBox <gi...@apache.org>.

giuseros commented on pull request #8096:
URL: https://github.com/apache/tvm/pull/8096#issuecomment-865367080


   Hi @manupa-arm , @areusch , 
   Here is a second attempt to resolve this. I basically applied what we discussed. I also cleaned a bit the TIR produced by removing all the `tir::Evaluate(0)` . Please, let me know what you guys think!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] giuseros commented on pull request #8096: Decoupling AOT from graph memory planner

Posted by GitBox <gi...@apache.org>.

giuseros commented on pull request #8096:
URL: https://github.com/apache/tvm/pull/8096#issuecomment-869972638


   Hi @areusch , I can reuse that in theory, but `TempStorageInfo` is only a local structure used to have on-demand memory allocation, i.e.,  I don't need to pass the `TempStorageInfo` around. This was the reason you suggested to not use struct of arrays here: https://github.com/apache/tvm/pull/8096#discussion_r638171243. I personally agreed with your comment and would prefer the way it is (i.e., array of structs), because using a struct of arrays makes the code less readable. However, if you guys feel strongly about reusing the `StorageInfo` struct of arrays, I can surely do it


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] giuseros commented on pull request #8096: Decoupling AOT from graph memory planner

Posted by GitBox <gi...@apache.org>.

giuseros commented on pull request #8096:
URL: https://github.com/apache/tvm/pull/8096#issuecomment-869880637


   Hi @manupa-arm , @jroesch , @areusch , 
   I just rebased and had to change `StorageInfo` to `TempStorageInfo` because of a conflict. Please, let me know if this is OK for you!
   
   Thanks,
   Giuseppe


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] giuseros commented on pull request #8096: Decoupling AOT from graph memory planner

Posted by GitBox <gi...@apache.org>.

giuseros commented on pull request #8096:
URL: https://github.com/apache/tvm/pull/8096#issuecomment-869880637






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] giuseros commented on pull request #8096: Decoupling AOT from graph memory planner

Posted by GitBox <gi...@apache.org>.

giuseros commented on pull request #8096:
URL: https://github.com/apache/tvm/pull/8096#issuecomment-846021337


   @manupa-arm I applied your comments. Sorry, this was a draft that I quickly turned into a PR, hence the in-elegance of the code. Please have another look and let me know. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] giuseros commented on pull request #8096: Decoupling AOT from graph memory planner

Posted by GitBox <gi...@apache.org>.

giuseros commented on pull request #8096:
URL: https://github.com/apache/tvm/pull/8096#issuecomment-854058578


   Hi @areusch , @manupa-arm ,
   I was wondering if there was anything else you may want me to address on this PR. 
   
   Thanks!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] areusch commented on a change in pull request #8096: Decoupling AOT from graph memory planner

Posted by GitBox <gi...@apache.org>.

areusch commented on a change in pull request #8096:
URL: https://github.com/apache/tvm/pull/8096#discussion_r659091345



##########
File path: src/tir/transforms/legalize_packed_calls.cc
##########
@@ -60,30 +60,41 @@ class PackedCallLegalizer : public StmtExprMutator {
     if (call) {
       if (call->op.same_as(builtin::tvm_call_cpacked())) {
         Array<PrimExpr> packed_args{call->args[0]};
+        std::vector<tir::Var> tvm_values;
         for (unsigned i = 1; i < call->args.size(); i++) {
           // No need to pack inputs of the prim_func
           if (inputs_[call->args[i]] == true) {
             packed_args.push_back(call->args[i]);
           } else {
             // Pack the argument inside a TVMValue
-            auto sid_array = tir::Var("tvm_value", DataType::Handle());
-            tir::Stmt set_struct_stmt = tir::Evaluate(
+            std::stringstream ss;
+            ss << "tvm_value_" << tvm_value_index_++;
+            auto sid_array = tir::Var(ss.str(), DataType::Handle());
+            tvm_values.push_back(sid_array);
+
+            new_stmts.push_back(tir::Evaluate(
                 tvm::tir::Call(DataType::Handle(), tvm::tir::builtin::tvm_struct_set(),
-                               {sid_array, 0, tir::builtin::kArrData, call->args[i]}));
-            new_stmts.push_back(LetStmt(sid_array, StackAlloca("array", 1), set_struct_stmt));
+                               {sid_array, 0, tir::builtin::kArrData, call->args[i]})));
             packed_args.push_back(sid_array);
           }
         }
-        // Finally, evaluate the packed call and return a sequential statement
+        // Evaluate the packed call
         new_stmts.push_back(tir::Evaluate(tir::Call(call->dtype, call->op, packed_args)));
-        return tir::SeqStmt(new_stmts);
+        tir::Stmt call_stmt = tir::SeqStmt(new_stmts);
+
+        // Allocate the TVMValues on the stack and define the variables
+        for (auto v : tvm_values) {
+          call_stmt = LetStmt(v, StackAlloca("array", 1), call_stmt);

Review comment:
       just wondering if we may want to eventually create a separate memory pool for these, and whether you'd be open to (in a future PR) converting into tir.allocate optionally




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] manupa-arm commented on a change in pull request #8096: Decoupling AOT from graph memory planner

Posted by GitBox <gi...@apache.org>.

manupa-arm commented on a change in pull request #8096:
URL: https://github.com/apache/tvm/pull/8096#discussion_r637039861



##########
File path: src/relay/backend/aot_executor_codegen.cc
##########
@@ -390,8 +518,8 @@ class AOTExecutorCodegen : public ExprVisitor {
     }
 
     ICHECK_GE(storage_device_map_.count(expr), 0);
-    auto& device_type = storage_device_map_[expr][1];
-    auto call_dev_type = device_type[0]->value;
+    auto& device_type = storage_device_map_[expr].dev_types;
+    auto call_dev_type = device_type[0];

Review comment:
       Sorry, I did not read it properly




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] areusch commented on pull request #8096: Decoupling AOT from graph memory planner

Posted by GitBox <gi...@apache.org>.

areusch commented on pull request #8096:
URL: https://github.com/apache/tvm/pull/8096#issuecomment-859919232


   hi @giuseros,
   
   you're right about a)--my analysis was incorrect of StorageRewrite and it shouldn't modify the parameters appreciably to the memory map. thanks for correcting me on that.
   
   on b), my concern is:
   - do other uses of StorageRewrite exist that may be affected by the narrow approach taken here?
   - how do we document what the narrow approach needs to accomplish using unit tests so that, if others run into problems with StorageRewrite elsewhere, they don't change it and break us?
   
   On the latter point, my main critique is that the unit tests are pretty coarse (e.g. they are integration tests). there are enough moving pieces that i'm not sure it's obvious from the tests how StorageRewrite should work (it probably is today, but things could change over time that would make it less so). possible to add a finer-grained unit test e.g. in `tests/python/unittest/test_tir_transform_storage_rewrite.py`?
   
   @tqchen also raised a concern on whether it's feasible to push StorageRewrite in the direction of having pointer analysis. The problem posed was: we currently pass buffers to operator implementation as pointers, and the pointer lifetime of those buffers is clear enough from convention (when the operator impl returns, they are invalid; so impl cannot save those pointers). if we start down the path of adding a more general reference counting impl to StorageRewrite, that implies we may be passing user data structs that contain pointers to operator implementations. however, it isn't really clear how we should explain the lifecycle of _those_ pointers.
   
   sort of I think this is a minor point given the current impl, but I also agree that if we are arguing that StorageRewrite's struct_set analysis is in fact a limited application of a general pattern, and implementing the general pattern is also fine, we do need to then elaborate where we are going with it. i do think keeping the impl in AOTExecutorCodegen doesn't carry this problem. 
   
   another possible way to alleviate both concerns would be to define an attr on the `tir.tvm_stack_alloca` node that explicitly instructs StorageRewrite to link that tvm_stack_alloca with a given AllocateNode. what do you think about that?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] PhilippvK commented on pull request #8096: Decoupling AOT from graph memory planner

Posted by GitBox <gi...@apache.org>.

PhilippvK commented on pull request #8096:
URL: https://github.com/apache/tvm/pull/8096#issuecomment-855934329


   > > thanks @giuseros for working on this! I tested it and it solves the issue that I had. So I close the issue assuming this will merge sometimes soon.
   > 
   > I think we should close issues when things merge not before that actually happens.
   
   I agree with this. I have experienced the same problems as described in #8062 and thought they should be already fixed. I’ve also tried out the proposed changes and they work for me at least for basic use cases.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] areusch commented on a change in pull request #8096: Decoupling AOT from graph memory planner

Posted by GitBox <gi...@apache.org>.

areusch commented on a change in pull request #8096:
URL: https://github.com/apache/tvm/pull/8096#discussion_r646828579



##########
File path: src/relay/backend/aot_executor_codegen.cc
##########
@@ -44,52 +45,185 @@ namespace tvm {
 namespace relay {
 namespace backend {
 
+/**
+ * Struct to contain information about the intermediate tensors in the
+ * runner function
+ */
+struct StorageInfo {
+  /*! \brief storage integer identifier of the particular intermediate buffer */
+  int sid;
+  /*! \brief exact size of the temporary */
+  int size_bytes;
+  /*! \brief device type of the intermediate tensor */
+  int dev_type;
+};
+
 using IntegerArray = Array<Integer>;
 using TargetsMap = std::unordered_map<int, Target>;
+using StorageMap = std::unordered_map<Expr, std::vector<StorageInfo>, runtime::ObjectPtrHash,
+                                      runtime::ObjectPtrEqual>;
 
-class AotReturnSidVisitor : public ExprVisitor {
+/**
+ * This is an on demand allocator for AOT. A new temporary
+ * (storage allocator identifier) is allocated for each operation.
+ */
+class AOTOnDemandAllocator : public ExprVisitor {
  public:
-  explicit AotReturnSidVisitor(Map<Expr, Array<IntegerArray>> storage_device_map)
-      : storage_device_map_{storage_device_map}, return_sid_{-1} {}
+  // run the visitor on a function.
+  void Run(const Function& func) {
+    node_device_map_ = CollectDeviceInfo(func);
 
-  IntegerArray FindReturnSid(Function func) {
-    VisitExpr(func->body);
-    return return_sid_;
+    for (Expr param : func->params) {
+      CreateStorage(param.operator->());
+    }
+
+    GetStorage(func->body);
   }
 
- protected:
-  void AssignReturnSid(Expr e) {
-    auto iter = storage_device_map_.find(e);
-    if (iter != storage_device_map_.end()) {
-      return_sid_ = (*iter).second[0];
+  std::vector<int> GetReturnIds() const { return return_ids_; }
+
+  StorageMap GetStorageMap() const { return storage_device_map_; }
+
+  void VisitExpr_(const ConstantNode* op) final {
+    CreateStorage(op);
+    AssignReturnSid(GetRef<Expr>(op));
+  }
+
+  void VisitExpr_(const CallNode* op) final {
+    // create token for the call node.
+    CreateStorage(op);
+    for (Expr arg : op->args) {
+      GetStorage(arg);
     }
+    AssignReturnSid(GetRef<Expr>(op));
   }
 
-  void VisitExpr_(const ConstantNode* cn) override {
-    ExprVisitor::VisitExpr_(cn);
-    AssignReturnSid(GetRef<Expr>(cn));
+  void VisitExpr_(const VarNode* op) final {
+    ExprVisitor::VisitExpr_(op);
+    AssignReturnSid(GetRef<Expr>(op));
   }
 
-  void VisitExpr_(const VarNode* vn) override {
-    ExprVisitor::VisitExpr_(vn);
-    AssignReturnSid(GetRef<Expr>(vn));
+  void VisitExpr_(const FunctionNode* op) final {
+    // do not recurse into sub function.
   }
 
-  void VisitExpr_(const CallNode* cn) override {
-    ExprVisitor::VisitExpr_(cn);
-    AssignReturnSid(GetRef<Expr>(cn));
+  void VisitExpr_(const GlobalVarNode* op) final {
+    // Do nothing.
   }
 
-  void VisitExpr_(const LetNode* op) override { VisitExpr(op->body); }
+  void VisitExpr_(const OpNode* op) final {
+    // Do nothing.
+  }
+
+  void VisitExpr_(const TupleNode* op) final {
+    std::vector<StorageInfo> field_sids;
+    Expr expr = GetRef<Expr>(op);
+    for (Expr field : op->fields) {
+      auto sid = GetStorage(field);
+      field_sids.insert(field_sids.end(), sid.begin(), sid.end());
+    }
+
+    storage_device_map_[expr] = field_sids;
+    AssignReturnSid(expr);
+  }
 
-  void VisitExpr_(const TupleNode* tn) override {
-    ExprVisitor::VisitExpr_(tn);
-    AssignReturnSid(GetRef<Expr>(tn));
+  void VisitExpr_(const TupleGetItemNode* op) final {
+    Expr expr = GetRef<Expr>(op);
+    const auto& sids = GetStorage(op->tuple);
+    ICHECK_LT(static_cast<size_t>(op->index), sids.size());
+    storage_device_map_[expr] = {sids[op->index]};
+    AssignReturnSid(expr);
   }
 
+  void VisitExpr_(const IfNode* op) final { LOG(FATAL) << "if is not supported."; }
+
+  void VisitExpr_(const LetNode* op) final { LOG(FATAL) << "if is not supported."; }
+
  private:
-  Map<Expr, Array<IntegerArray>> storage_device_map_;
-  IntegerArray return_sid_;
+  void AssignReturnSid(Expr e) {
+    if (storage_device_map_.find(e) != storage_device_map_.end()) {
+      auto buffers = storage_device_map_[e];
+      std::vector<int> return_ids;
+      for (auto buffer : buffers) {
+        return_ids.push_back(buffer.sid);
+      }
+      return_ids_ = return_ids;
+    }
+  }
+  /*!
+   * \brief ceil(size/word_size) to get number of words.
+   * \param size The original size.
+   * \param word_size The element size.
+   */
+  static size_t DivRoundUp(size_t size, size_t word_size) {
+    return (size + word_size - 1) / word_size;
+  }
+  /*!
+   * \brief Get the memory requirement.
+   * \param prototype The prototype token.
+   * \return The required memory size.
+   */
+  size_t GetMemorySize(const TensorTypeNode* ttype) {

Review comment:
       prefer to include units if possible e.g. GetMemorySizeBytes or GetNumBytes

##########
File path: src/tir/transforms/storage_rewrite.cc
##########
@@ -138,6 +138,34 @@ class LinearAccessPatternFinder final : public StmtExprVisitor {
     if (op->op.same_as(builtin::address_of())) {
       const LoadNode* l = op->args[0].as<LoadNode>();
       this->VisitExpr(l->index);
+    } else if (op->op.same_as(builtin::tvm_call_cpacked())) {
+      // Recall that the arguments of a tvm_call_cpacked are passed as
+      // TVMValues. But a TVMValue is only a container, that points to
+      // a real buffer previously allocated. We need to signal that those
+      // buffers need to be live at the same time (i.e., cannot be overridden)
+      Array<PrimExpr> args = op->args;
+      for (auto arg : args) {
+        const VarNode* var = arg.as<VarNode>();
+        if (value_to_alloc_.find(var) != value_to_alloc_.end()) {
+          auto allocs = value_to_alloc_[var];
+          for (const VarNode* alloc : allocs) {
+            VisitExpr_(alloc);
+          }
+        } else {
+          this->VisitExpr(arg);
+        }
+      }
+    } else if (op->op.same_as(builtin::tvm_struct_set())) {

Review comment:
       are these the only two such builtins we need to care about? seems like any access to the data would be affected, no?

##########
File path: src/relay/backend/aot_executor_codegen.cc
##########
@@ -44,52 +45,185 @@ namespace tvm {
 namespace relay {
 namespace backend {
 
+/**
+ * Struct to contain information about the intermediate tensors in the
+ * runner function
+ */
+struct StorageInfo {
+  /*! \brief storage integer identifier of the particular intermediate buffer */
+  int sid;
+  /*! \brief exact size of the temporary */
+  int size_bytes;
+  /*! \brief device type of the intermediate tensor */
+  int dev_type;
+};
+
 using IntegerArray = Array<Integer>;
 using TargetsMap = std::unordered_map<int, Target>;
+using StorageMap = std::unordered_map<Expr, std::vector<StorageInfo>, runtime::ObjectPtrHash,
+                                      runtime::ObjectPtrEqual>;
 
-class AotReturnSidVisitor : public ExprVisitor {
+/**
+ * This is an on demand allocator for AOT. A new temporary
+ * (storage allocator identifier) is allocated for each operation.
+ */
+class AOTOnDemandAllocator : public ExprVisitor {
  public:
-  explicit AotReturnSidVisitor(Map<Expr, Array<IntegerArray>> storage_device_map)
-      : storage_device_map_{storage_device_map}, return_sid_{-1} {}
+  // run the visitor on a function.
+  void Run(const Function& func) {
+    node_device_map_ = CollectDeviceInfo(func);
 
-  IntegerArray FindReturnSid(Function func) {
-    VisitExpr(func->body);
-    return return_sid_;
+    for (Expr param : func->params) {
+      CreateStorage(param.operator->());
+    }
+
+    GetStorage(func->body);
   }
 
- protected:
-  void AssignReturnSid(Expr e) {
-    auto iter = storage_device_map_.find(e);
-    if (iter != storage_device_map_.end()) {
-      return_sid_ = (*iter).second[0];
+  std::vector<int> GetReturnIds() const { return return_ids_; }
+
+  StorageMap GetStorageMap() const { return storage_device_map_; }
+
+  void VisitExpr_(const ConstantNode* op) final {
+    CreateStorage(op);
+    AssignReturnSid(GetRef<Expr>(op));
+  }
+
+  void VisitExpr_(const CallNode* op) final {
+    // create token for the call node.
+    CreateStorage(op);
+    for (Expr arg : op->args) {
+      GetStorage(arg);
     }
+    AssignReturnSid(GetRef<Expr>(op));
+  }
+
+  void VisitExpr_(const VarNode* op) final {
+    ExprVisitor::VisitExpr_(op);
+    AssignReturnSid(GetRef<Expr>(op));
   }
 
-  void VisitExpr_(const ConstantNode* cn) override {
-    ExprVisitor::VisitExpr_(cn);
-    AssignReturnSid(GetRef<Expr>(cn));
+  void VisitExpr_(const FunctionNode* op) final {
+    // do not recurse into sub function.
   }
 
-  void VisitExpr_(const VarNode* vn) override {
-    ExprVisitor::VisitExpr_(vn);
-    AssignReturnSid(GetRef<Expr>(vn));
+  void VisitExpr_(const GlobalVarNode* op) final {
+    // Do nothing.
   }
 
-  void VisitExpr_(const CallNode* cn) override {
-    ExprVisitor::VisitExpr_(cn);
-    AssignReturnSid(GetRef<Expr>(cn));
+  void VisitExpr_(const OpNode* op) final {
+    // Do nothing.
   }
 
-  void VisitExpr_(const LetNode* op) override { VisitExpr(op->body); }
+  void VisitExpr_(const TupleNode* op) final {
+    std::vector<StorageInfo> field_sids;
+    Expr expr = GetRef<Expr>(op);
+    for (Expr field : op->fields) {
+      auto sid = GetStorage(field);
+      field_sids.insert(field_sids.end(), sid.begin(), sid.end());
+    }
 
-  void VisitExpr_(const TupleNode* tn) override {
-    ExprVisitor::VisitExpr_(tn);
-    AssignReturnSid(GetRef<Expr>(tn));
+    storage_device_map_[expr] = field_sids;
+    AssignReturnSid(expr);
   }
 
+  void VisitExpr_(const TupleGetItemNode* op) final {
+    Expr expr = GetRef<Expr>(op);
+    const auto& sids = GetStorage(op->tuple);
+    ICHECK_LT(static_cast<size_t>(op->index), sids.size());
+    storage_device_map_[expr] = {sids[op->index]};
+    AssignReturnSid(expr);
+  }
+
+  void VisitExpr_(const IfNode* op) final { LOG(FATAL) << "if is not supported."; }
+
+  void VisitExpr_(const LetNode* op) final { LOG(FATAL) << "if is not supported."; }
+
  private:
-  Map<Expr, Array<IntegerArray>> storage_device_map_;
-  IntegerArray return_sid_;
+  void AssignReturnSid(Expr e) {
+    if (storage_device_map_.find(e) != storage_device_map_.end()) {
+      auto buffers = storage_device_map_[e];
+      std::vector<int> return_ids;
+      for (auto buffer : buffers) {
+        return_ids.push_back(buffer.sid);
+      }
+      return_ids_ = return_ids;
+    }
+  }
+  /*!
+   * \brief ceil(size/word_size) to get number of words.
+   * \param size The original size.
+   * \param word_size The element size.
+   */
+  static size_t DivRoundUp(size_t size, size_t word_size) {
+    return (size + word_size - 1) / word_size;
+  }
+  /*!
+   * \brief Get the memory requirement.
+   * \param prototype The prototype token.
+   * \return The required memory size.
+   */
+  size_t GetMemorySize(const TensorTypeNode* ttype) {
+    ICHECK(ttype != nullptr);
+    size_t size = 1;
+    for (IndexExpr dim : ttype->shape) {
+      const int64_t* pval = tir::as_const_int(dim);
+      ICHECK(pval != nullptr) << "Cannot allocate memory symbolic tensor shape " << ttype->shape;
+      ICHECK_GE(*pval, 0) << "Cannot allocate memory for tensor with negative shape" << *pval;
+      size *= static_cast<size_t>(pval[0]);
+    }
+    size *= DivRoundUp(ttype->dtype.bits() * ttype->dtype.lanes(), 8);
+    return size;
+  }
+  /*!
+   * \brief Get the necessary storage for the expression.
+   * \param expr The expression.
+   * \return The corresponding token.
+   */
+  std::vector<StorageInfo> GetStorage(const Expr& expr) {
+    this->VisitExpr(expr);
+    auto it = storage_device_map_.find(expr);
+    ICHECK(it != storage_device_map_.end());
+    return it->second;
+  }
+
+  /*!
+   * \brief Create storage for the expression.
+   * \param expr The expression.
+   */
+  void CreateStorage(const ExprNode* op) {
+    std::vector<StorageInfo> buffers;
+    Expr expr = GetRef<Expr>(op);
+    int device_type = node_device_map_.count(GetRef<Expr>(op)) ? node_device_map_[expr]->value : 0;
+    if (const auto* tuple_type = op->checked_type().as<TupleTypeNode>()) {
+      for (Type t : tuple_type->fields) {
+        const auto* ttype = t.as<TensorTypeNode>();
+        ICHECK(ttype);
+        StorageInfo buffer;
+        buffer.sid = sid_++;
+        buffer.size_bytes = GetMemorySize(ttype);
+        buffer.dev_type = device_type;
+        buffers.push_back(buffer);
+      }
+    } else {
+      const auto* ttype = op->checked_type().as<TensorTypeNode>();
+      ICHECK(ttype);
+      StorageInfo buffer;
+      buffer.sid = sid_++;
+      buffer.size_bytes = GetMemorySize(ttype);
+      buffer.dev_type = device_type;
+      buffers.push_back(buffer);
+    }
+    storage_device_map_[expr] = buffers;
+  }
+  /*! \brief mapping of expression -> storageInfo*/
+  StorageMap storage_device_map_;
+  /*! \brief mapping of expression -> device type*/
+  Map<Expr, Integer> node_device_map_;
+  /*! \brief current id of the temporary allocated*/
+  int sid_{0};

Review comment:
       maybe name this `next_available_sid_`?

##########
File path: src/tir/transforms/storage_rewrite.cc
##########
@@ -138,6 +138,34 @@ class LinearAccessPatternFinder final : public StmtExprVisitor {
     if (op->op.same_as(builtin::address_of())) {
       const LoadNode* l = op->args[0].as<LoadNode>();
       this->VisitExpr(l->index);
+    } else if (op->op.same_as(builtin::tvm_call_cpacked())) {
+      // Recall that the arguments of a tvm_call_cpacked are passed as
+      // TVMValues. But a TVMValue is only a container, that points to
+      // a real buffer previously allocated. We need to signal that those
+      // buffers need to be live at the same time (i.e., cannot be overridden)

Review comment:
       nit: change to say something like: (i.e. cannot be overwritten during the function call). 

##########
File path: tests/python/relay/aot/test_crt_aot.py
##########
@@ -364,5 +364,27 @@ def test_byoc_utvm(use_calculated_workspaces):
     compile_and_run(mod, input_list, output_list, use_calculated_workspaces)
 
 
+def test_quant_mobilenet_tfl():

Review comment:
       can you also do this for the test case below?

##########
File path: src/relay/backend/aot_executor_codegen.cc
##########
@@ -44,52 +45,185 @@ namespace tvm {
 namespace relay {
 namespace backend {
 
+/**
+ * Struct to contain information about the intermediate tensors in the
+ * runner function
+ */
+struct StorageInfo {
+  /*! \brief storage integer identifier of the particular intermediate buffer */
+  int sid;
+  /*! \brief exact size of the temporary */
+  int size_bytes;
+  /*! \brief device type of the intermediate tensor */
+  int dev_type;
+};
+
 using IntegerArray = Array<Integer>;
 using TargetsMap = std::unordered_map<int, Target>;
+using StorageMap = std::unordered_map<Expr, std::vector<StorageInfo>, runtime::ObjectPtrHash,
+                                      runtime::ObjectPtrEqual>;
 
-class AotReturnSidVisitor : public ExprVisitor {
+/**
+ * This is an on demand allocator for AOT. A new temporary
+ * (storage allocator identifier) is allocated for each operation.
+ */
+class AOTOnDemandAllocator : public ExprVisitor {
  public:
-  explicit AotReturnSidVisitor(Map<Expr, Array<IntegerArray>> storage_device_map)
-      : storage_device_map_{storage_device_map}, return_sid_{-1} {}
+  // run the visitor on a function.
+  void Run(const Function& func) {
+    node_device_map_ = CollectDeviceInfo(func);
 
-  IntegerArray FindReturnSid(Function func) {
-    VisitExpr(func->body);
-    return return_sid_;
+    for (Expr param : func->params) {
+      CreateStorage(param.operator->());
+    }
+
+    GetStorage(func->body);
   }
 
- protected:
-  void AssignReturnSid(Expr e) {
-    auto iter = storage_device_map_.find(e);
-    if (iter != storage_device_map_.end()) {
-      return_sid_ = (*iter).second[0];
+  std::vector<int> GetReturnIds() const { return return_ids_; }
+
+  StorageMap GetStorageMap() const { return storage_device_map_; }
+
+  void VisitExpr_(const ConstantNode* op) final {
+    CreateStorage(op);
+    AssignReturnSid(GetRef<Expr>(op));
+  }
+
+  void VisitExpr_(const CallNode* op) final {
+    // create token for the call node.
+    CreateStorage(op);
+    for (Expr arg : op->args) {
+      GetStorage(arg);
     }
+    AssignReturnSid(GetRef<Expr>(op));
   }
 
-  void VisitExpr_(const ConstantNode* cn) override {
-    ExprVisitor::VisitExpr_(cn);
-    AssignReturnSid(GetRef<Expr>(cn));
+  void VisitExpr_(const VarNode* op) final {
+    ExprVisitor::VisitExpr_(op);
+    AssignReturnSid(GetRef<Expr>(op));
   }
 
-  void VisitExpr_(const VarNode* vn) override {
-    ExprVisitor::VisitExpr_(vn);
-    AssignReturnSid(GetRef<Expr>(vn));
+  void VisitExpr_(const FunctionNode* op) final {
+    // do not recurse into sub function.
   }
 
-  void VisitExpr_(const CallNode* cn) override {
-    ExprVisitor::VisitExpr_(cn);
-    AssignReturnSid(GetRef<Expr>(cn));
+  void VisitExpr_(const GlobalVarNode* op) final {
+    // Do nothing.
   }
 
-  void VisitExpr_(const LetNode* op) override { VisitExpr(op->body); }
+  void VisitExpr_(const OpNode* op) final {
+    // Do nothing.
+  }
+
+  void VisitExpr_(const TupleNode* op) final {
+    std::vector<StorageInfo> field_sids;
+    Expr expr = GetRef<Expr>(op);
+    for (Expr field : op->fields) {
+      auto sid = GetStorage(field);
+      field_sids.insert(field_sids.end(), sid.begin(), sid.end());
+    }
+
+    storage_device_map_[expr] = field_sids;
+    AssignReturnSid(expr);
+  }
 
-  void VisitExpr_(const TupleNode* tn) override {
-    ExprVisitor::VisitExpr_(tn);
-    AssignReturnSid(GetRef<Expr>(tn));
+  void VisitExpr_(const TupleGetItemNode* op) final {
+    Expr expr = GetRef<Expr>(op);
+    const auto& sids = GetStorage(op->tuple);
+    ICHECK_LT(static_cast<size_t>(op->index), sids.size());
+    storage_device_map_[expr] = {sids[op->index]};
+    AssignReturnSid(expr);
   }
 
+  void VisitExpr_(const IfNode* op) final { LOG(FATAL) << "if is not supported."; }
+
+  void VisitExpr_(const LetNode* op) final { LOG(FATAL) << "if is not supported."; }
+
  private:
-  Map<Expr, Array<IntegerArray>> storage_device_map_;
-  IntegerArray return_sid_;
+  void AssignReturnSid(Expr e) {
+    if (storage_device_map_.find(e) != storage_device_map_.end()) {
+      auto buffers = storage_device_map_[e];
+      std::vector<int> return_ids;

Review comment:
       nit: perhaps slightly faster to mutate return_ids_ rather than assign?

##########
File path: src/tir/transforms/storage_rewrite.cc
##########
@@ -206,6 +234,8 @@ class LinearAccessPatternFinder final : public StmtExprVisitor {
   bool in_thread_env_{false};
   // The scope stack.
   std::vector<StmtEntry> scope_;
+  // This is a map to connect TVMValues to real allocations
+  std::unordered_map<const VarNode*, std::vector<const VarNode*>> value_to_alloc_;

Review comment:
       could you update the comment to better explain the keys and values, and the rules for when something should be added here?

##########
File path: tests/python/relay/aot/test_crt_aot.py
##########
@@ -364,5 +364,27 @@ def test_byoc_utvm(use_calculated_workspaces):
     compile_and_run(mod, input_list, output_list, use_calculated_workspaces)
 
 
+def test_quant_mobilenet_tfl():

Review comment:
       yes, it would be great to explain the thing we are trying to test using mobilenet, and if possible, contrive a testcase not based on data from the internet.

##########
File path: src/relay/backend/aot_executor_codegen.cc
##########
@@ -44,52 +45,185 @@ namespace tvm {
 namespace relay {
 namespace backend {
 
+/**
+ * Struct to contain information about the intermediate tensors in the
+ * runner function
+ */
+struct StorageInfo {
+  /*! \brief storage integer identifier of the particular intermediate buffer */
+  int sid;
+  /*! \brief exact size of the temporary */
+  int size_bytes;
+  /*! \brief device type of the intermediate tensor */
+  int dev_type;
+};
+
 using IntegerArray = Array<Integer>;
 using TargetsMap = std::unordered_map<int, Target>;
+using StorageMap = std::unordered_map<Expr, std::vector<StorageInfo>, runtime::ObjectPtrHash,
+                                      runtime::ObjectPtrEqual>;
 
-class AotReturnSidVisitor : public ExprVisitor {
+/**
+ * This is an on demand allocator for AOT. A new temporary
+ * (storage allocator identifier) is allocated for each operation.
+ */
+class AOTOnDemandAllocator : public ExprVisitor {
  public:
-  explicit AotReturnSidVisitor(Map<Expr, Array<IntegerArray>> storage_device_map)
-      : storage_device_map_{storage_device_map}, return_sid_{-1} {}
+  // run the visitor on a function.
+  void Run(const Function& func) {
+    node_device_map_ = CollectDeviceInfo(func);
 
-  IntegerArray FindReturnSid(Function func) {
-    VisitExpr(func->body);
-    return return_sid_;
+    for (Expr param : func->params) {
+      CreateStorage(param.operator->());
+    }
+
+    GetStorage(func->body);
   }
 
- protected:
-  void AssignReturnSid(Expr e) {
-    auto iter = storage_device_map_.find(e);
-    if (iter != storage_device_map_.end()) {
-      return_sid_ = (*iter).second[0];
+  std::vector<int> GetReturnIds() const { return return_ids_; }
+
+  StorageMap GetStorageMap() const { return storage_device_map_; }
+
+  void VisitExpr_(const ConstantNode* op) final {
+    CreateStorage(op);
+    AssignReturnSid(GetRef<Expr>(op));
+  }
+
+  void VisitExpr_(const CallNode* op) final {
+    // create token for the call node.
+    CreateStorage(op);
+    for (Expr arg : op->args) {
+      GetStorage(arg);
     }
+    AssignReturnSid(GetRef<Expr>(op));
+  }
+
+  void VisitExpr_(const VarNode* op) final {
+    ExprVisitor::VisitExpr_(op);
+    AssignReturnSid(GetRef<Expr>(op));
   }
 
-  void VisitExpr_(const ConstantNode* cn) override {
-    ExprVisitor::VisitExpr_(cn);
-    AssignReturnSid(GetRef<Expr>(cn));
+  void VisitExpr_(const FunctionNode* op) final {
+    // do not recurse into sub function.
   }
 
-  void VisitExpr_(const VarNode* vn) override {
-    ExprVisitor::VisitExpr_(vn);
-    AssignReturnSid(GetRef<Expr>(vn));
+  void VisitExpr_(const GlobalVarNode* op) final {
+    // Do nothing.
   }
 
-  void VisitExpr_(const CallNode* cn) override {
-    ExprVisitor::VisitExpr_(cn);
-    AssignReturnSid(GetRef<Expr>(cn));
+  void VisitExpr_(const OpNode* op) final {
+    // Do nothing.
   }
 
-  void VisitExpr_(const LetNode* op) override { VisitExpr(op->body); }
+  void VisitExpr_(const TupleNode* op) final {
+    std::vector<StorageInfo> field_sids;
+    Expr expr = GetRef<Expr>(op);
+    for (Expr field : op->fields) {
+      auto sid = GetStorage(field);
+      field_sids.insert(field_sids.end(), sid.begin(), sid.end());
+    }
 
-  void VisitExpr_(const TupleNode* tn) override {
-    ExprVisitor::VisitExpr_(tn);
-    AssignReturnSid(GetRef<Expr>(tn));
+    storage_device_map_[expr] = field_sids;
+    AssignReturnSid(expr);
   }
 
+  void VisitExpr_(const TupleGetItemNode* op) final {
+    Expr expr = GetRef<Expr>(op);
+    const auto& sids = GetStorage(op->tuple);
+    ICHECK_LT(static_cast<size_t>(op->index), sids.size());
+    storage_device_map_[expr] = {sids[op->index]};
+    AssignReturnSid(expr);
+  }
+
+  void VisitExpr_(const IfNode* op) final { LOG(FATAL) << "if is not supported."; }
+
+  void VisitExpr_(const LetNode* op) final { LOG(FATAL) << "if is not supported."; }

Review comment:
       nit: "let is not supported"




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] areusch edited a comment on pull request #8096: Decoupling AOT from graph memory planner

Posted by GitBox <gi...@apache.org>.

areusch edited a comment on pull request #8096:
URL: https://github.com/apache/tvm/pull/8096#issuecomment-869882513


   hey @giuseros, i think @jroesch was hoping with #8297 that you could re-use that common StorageInfo struct for your purposes here. is it possible to do that? sorry if this was unclear from that PR.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] giuseros commented on a change in pull request #8096: Decoupling AOT from graph memory planner

Posted by GitBox <gi...@apache.org>.

giuseros commented on a change in pull request #8096:
URL: https://github.com/apache/tvm/pull/8096#discussion_r638852403



##########
File path: tests/python/relay/aot/test_crt_aot.py
##########
@@ -364,5 +364,27 @@ def test_byoc_utvm(use_calculated_workspaces):
     compile_and_run(mod, input_list, output_list, use_calculated_workspaces)
 
 
+def test_quant_mobilenet_tfl():

Review comment:
       This is testing  #8062. Should I explicitly mention the issue in a comment?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] areusch commented on a change in pull request #8096: Decoupling AOT from graph memory planner

Posted by GitBox <gi...@apache.org>.

areusch commented on a change in pull request #8096:
URL: https://github.com/apache/tvm/pull/8096#discussion_r638207535



##########
File path: src/relay/backend/aot_executor_codegen.cc
##########
@@ -44,52 +45,179 @@ namespace tvm {
 namespace relay {
 namespace backend {
 
+/**
+ * Struct to contain information about intermediate variables in the
+ * runner function
+ */
+struct StorageInfo {
+  /*! \brief unique integer identifier of the particular intermediate variable */

Review comment:
       it would be great to qualify the name `ids` here, since there are storage ids, buffer ids, etc. also, would it make more sense to say "intermediate buffer" here?

##########
File path: src/relay/backend/aot_executor_codegen.cc
##########
@@ -44,52 +45,179 @@ namespace tvm {
 namespace relay {
 namespace backend {
 
+/**
+ * Struct to contain information about intermediate variables in the
+ * runner function
+ */
+struct StorageInfo {

Review comment:
       each of these vectors are of the same length with corresponding elements in matching indicies, right? does it then make sense to just declare `std::vector<StorageInfo>` and remove the vectors from here? In GraphPlanMemory, we split them because you can't send user-defined data types through PackedFunc calls.

##########
File path: src/relay/backend/aot_executor_codegen.cc
##########
@@ -44,52 +45,179 @@ namespace tvm {
 namespace relay {
 namespace backend {
 
+/**
+ * Struct to contain information about intermediate variables in the
+ * runner function
+ */
+struct StorageInfo {
+  /*! \brief unique integer identifier of the particular intermediate variable */
+  std::vector<int> ids;
+  /*! \brief exact size of the temporary */
+  std::vector<int> sizes_bytes;
+  /*! \brief device type of the temporary variable */
+  std::vector<int> dev_types;
+};
+
 using IntegerArray = Array<Integer>;
 using TargetsMap = std::unordered_map<int, Target>;
+using StorageMap =
+    std::unordered_map<Expr, StorageInfo, runtime::ObjectPtrHash, runtime::ObjectPtrEqual>;
 
-class AotReturnSidVisitor : public ExprVisitor {
+/**
+ * This is an on demand allocator for AOT. A new temporary
+ * (storage allocator identifier) is allocated for each operation.
+ */
+class AOTOnDemandAllocator : public ExprVisitor {
  public:
-  explicit AotReturnSidVisitor(Map<Expr, Array<IntegerArray>> storage_device_map)
-      : storage_device_map_{storage_device_map}, return_sid_{-1} {}
+  // run the visitor on a function.
+  void Run(const Function& func) {
+    node_device_map_ = CollectDeviceInfo(func);
 
-  IntegerArray FindReturnSid(Function func) {
-    VisitExpr(func->body);
-    return return_sid_;
+    for (Expr param : func->params) {
+      CreateSid(param.operator->());
+    }
+
+    GetSid(func->body);
   }
 
- protected:
-  void AssignReturnSid(Expr e) {
-    auto iter = storage_device_map_.find(e);
-    if (iter != storage_device_map_.end()) {
-      return_sid_ = (*iter).second[0];
+  std::vector<int> GetReturnIds() const { return return_ids_; }
+
+  StorageMap GetStorageMap() const { return storage_device_map_; }
+
+  void VisitExpr_(const ConstantNode* op) final {
+    CreateSid(op);
+    AssignReturnSid(GetRef<Expr>(op));
+  }
+
+  void VisitExpr_(const CallNode* op) final {
+    // create token for the call node.
+    CreateSid(op);
+    for (Expr arg : op->args) {
+      GetSid(arg);
     }
+    AssignReturnSid(GetRef<Expr>(op));
   }
 
-  void VisitExpr_(const ConstantNode* cn) override {
-    ExprVisitor::VisitExpr_(cn);
-    AssignReturnSid(GetRef<Expr>(cn));
+  void VisitExpr_(const VarNode* op) final {
+    ExprVisitor::VisitExpr_(op);
+    AssignReturnSid(GetRef<Expr>(op));
   }
 
-  void VisitExpr_(const VarNode* vn) override {
-    ExprVisitor::VisitExpr_(vn);
-    AssignReturnSid(GetRef<Expr>(vn));
+  void VisitExpr_(const FunctionNode* op) final {
+    // do not recurse into sub function.
   }
 
-  void VisitExpr_(const CallNode* cn) override {
-    ExprVisitor::VisitExpr_(cn);
-    AssignReturnSid(GetRef<Expr>(cn));
+  void VisitExpr_(const GlobalVarNode* op) final {
+    // Do nothing.
   }
 
-  void VisitExpr_(const LetNode* op) override { VisitExpr(op->body); }
+  void VisitExpr_(const OpNode* op) final {
+    // Do nothing.
+  }
+
+  void VisitExpr_(const TupleNode* op) final {
+    StorageInfo field_sid;
+    Expr expr = GetRef<Expr>(op);
+    for (Expr field : op->fields) {
+      auto sid = GetSid(field);
+      field_sid.ids.insert(field_sid.ids.end(), sid.ids.begin(), sid.ids.end());
+      field_sid.dev_types.insert(field_sid.dev_types.end(), sid.dev_types.begin(),
+                                 sid.dev_types.end());
+      field_sid.sizes_bytes.insert(field_sid.sizes_bytes.end(), sid.sizes_bytes.begin(),
+                                   sid.sizes_bytes.end());
+    }
+
+    storage_device_map_[expr] = field_sid;
+    AssignReturnSid(expr);
+  }
 
-  void VisitExpr_(const TupleNode* tn) override {
-    ExprVisitor::VisitExpr_(tn);
-    AssignReturnSid(GetRef<Expr>(tn));
+  void VisitExpr_(const TupleGetItemNode* op) final {
+    Expr expr = GetRef<Expr>(op);
+    const auto& sid = GetSid(op->tuple);
+    ICHECK_LT(static_cast<size_t>(op->index), sid.ids.size());
+    storage_device_map_[expr].ids = {sid.ids[op->index]};
+    storage_device_map_[expr].sizes_bytes = {sid.sizes_bytes[op->index]};
+    storage_device_map_[expr].dev_types = {sid.dev_types[op->index]};
+    AssignReturnSid(expr);
   }
 
+  void VisitExpr_(const IfNode* op) final { LOG(FATAL) << "if is not supported."; }
+
+  void VisitExpr_(const LetNode* op) final { LOG(FATAL) << "if is not supported."; }
+
  private:
-  Map<Expr, Array<IntegerArray>> storage_device_map_;
-  IntegerArray return_sid_;
+  void AssignReturnSid(Expr e) {
+    auto iter = storage_device_map_.find(e);
+    if (iter != storage_device_map_.end()) {
+      return_ids_ = (*iter).second.ids;
+    }
+  }
+  /*!
+   * \brief ceil(size/word_size) to get number of words.
+   * \param size The original size.
+   * \param word_size The element size.
+   */
+  static size_t DivRoundUp(size_t size, size_t word_size) {
+    return (size + word_size - 1) / word_size;
+  }
+  /*!
+   * \brief Get the memory requirement.
+   * \param prototype The prototype token.
+   * \return The required memory size.
+   */
+  size_t GetMemorySize(const TensorTypeNode* ttype) {
+    ICHECK(ttype != nullptr);
+    size_t size = 1;
+    for (IndexExpr dim : ttype->shape) {
+      const int64_t* pval = tir::as_const_int(dim);
+      ICHECK(pval != nullptr) << "Cannot allocate memory symbolic tensor shape " << ttype->shape;
+      ICHECK_GE(*pval, 0) << "Cannot allocate memory for tensor with negative shape" << *pval;
+      size *= static_cast<size_t>(pval[0]);
+    }
+    size *= DivRoundUp(ttype->dtype.bits() * ttype->dtype.lanes(), 8);
+    return size;
+  }
+  /*!
+   * \brief Get the necessary token.
+   * \param expr The expression.
+   * \return The corresponding token.
+   */
+  StorageInfo GetSid(const Expr& expr) {
+    this->VisitExpr(expr);
+    auto it = storage_device_map_.find(expr);
+    ICHECK(it != storage_device_map_.end());
+    return it->second;
+  }
+
+  void CreateSid(const ExprNode* op) {
+    StorageInfo sid;
+    Expr expr = GetRef<Expr>(op);
+    int device_type = node_device_map_.count(GetRef<Expr>(op)) ? node_device_map_[expr]->value : 0;
+    if (const auto* tuple_type = op->checked_type().as<TupleTypeNode>()) {
+      for (Type t : tuple_type->fields) {
+        const auto* ttype = t.as<TensorTypeNode>();
+        ICHECK(ttype);
+        sid.ids.push_back(sid_++);
+        sid.dev_types.push_back(device_type);
+        sid.sizes_bytes.push_back(GetMemorySize(ttype));
+      }
+    } else {
+      const auto* ttype = op->checked_type().as<TensorTypeNode>();
+      ICHECK(ttype);
+      sid.ids.push_back(sid_++);
+      sid.dev_types.push_back(device_type);
+      sid.sizes_bytes.push_back(GetMemorySize(ttype));
+    }
+    storage_device_map_[expr] = sid;
+  }
+  /*! \brief mapping of expression -> storageInfo*/
+  StorageMap storage_device_map_;
+  /*! \brief mapping of expression -> device type*/
+  Map<Expr, Integer> node_device_map_;
+  /*! \brief current id of the temporary allocated*/
+  int sid_{0};
+  /*! \brief the set of identifiers that are return variables */

Review comment:
       could you explain in comment what the "identifiers" are here? there are lots of those here now.

##########
File path: tests/python/relay/aot/test_crt_aot.py
##########
@@ -364,5 +364,27 @@ def test_byoc_utvm(use_calculated_workspaces):
     compile_and_run(mod, input_list, output_list, use_calculated_workspaces)
 
 
+def test_quant_mobilenet_tfl():

Review comment:
       could you clarify which part of the PR this is testing?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] giuseros commented on pull request #8096: Decoupling AOT from graph memory planner

Posted by GitBox <gi...@apache.org>.

giuseros commented on pull request #8096:
URL: https://github.com/apache/tvm/pull/8096#issuecomment-870512072


   Hi @jroesch, 
   The solution I picked is to use the common `StorageInfo` so that later on we can have a PR to move from SoA to AoS for all the users of that structure (i.e., all the memory planning algorithms). 
   
   Thanks,
   Giuseppe


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] areusch commented on pull request #8096: Decoupling AOT from graph memory planner

Posted by GitBox <gi...@apache.org>.

areusch commented on pull request #8096:
URL: https://github.com/apache/tvm/pull/8096#issuecomment-870979958


   thanks for working through this one with us @giuseros, the PR is now merged! we will now temporarily prioritize #7518 over further compiler changes. we aim to land that this week, so feel free to send additional PRs based on top of that if you'd like.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] giuseros commented on a change in pull request #8096: Decoupling AOT from graph memory planner

Posted by GitBox <gi...@apache.org>.

giuseros commented on a change in pull request #8096:
URL: https://github.com/apache/tvm/pull/8096#discussion_r659914709



##########
File path: src/tir/transforms/legalize_packed_calls.cc
##########
@@ -60,30 +60,41 @@ class PackedCallLegalizer : public StmtExprMutator {
     if (call) {
       if (call->op.same_as(builtin::tvm_call_cpacked())) {
         Array<PrimExpr> packed_args{call->args[0]};
+        std::vector<tir::Var> tvm_values;
         for (unsigned i = 1; i < call->args.size(); i++) {
           // No need to pack inputs of the prim_func
           if (inputs_[call->args[i]] == true) {
             packed_args.push_back(call->args[i]);
           } else {
             // Pack the argument inside a TVMValue
-            auto sid_array = tir::Var("tvm_value", DataType::Handle());
-            tir::Stmt set_struct_stmt = tir::Evaluate(
+            std::stringstream ss;
+            ss << "tvm_value_" << tvm_value_index_++;
+            auto sid_array = tir::Var(ss.str(), DataType::Handle());
+            tvm_values.push_back(sid_array);
+
+            new_stmts.push_back(tir::Evaluate(
                 tvm::tir::Call(DataType::Handle(), tvm::tir::builtin::tvm_struct_set(),
-                               {sid_array, 0, tir::builtin::kArrData, call->args[i]}));
-            new_stmts.push_back(LetStmt(sid_array, StackAlloca("array", 1), set_struct_stmt));
+                               {sid_array, 0, tir::builtin::kArrData, call->args[i]})));
             packed_args.push_back(sid_array);
           }
         }
-        // Finally, evaluate the packed call and return a sequential statement
+        // Evaluate the packed call
         new_stmts.push_back(tir::Evaluate(tir::Call(call->dtype, call->op, packed_args)));
-        return tir::SeqStmt(new_stmts);
+        tir::Stmt call_stmt = tir::SeqStmt(new_stmts);
+
+        // Allocate the TVMValues on the stack and define the variables
+        for (auto v : tvm_values) {
+          call_stmt = LetStmt(v, StackAlloca("array", 1), call_stmt);

Review comment:
       Yes, I think it is a very good idea, and I would personally like a similar direction. @manupa-arm, what do you think?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] areusch commented on a change in pull request #8096: Decoupling AOT from graph memory planner

Posted by GitBox <gi...@apache.org>.

areusch commented on a change in pull request #8096:
URL: https://github.com/apache/tvm/pull/8096#discussion_r649381183



##########
File path: src/relay/backend/aot_executor_codegen.cc
##########
@@ -44,52 +45,184 @@ namespace tvm {
 namespace relay {
 namespace backend {
 
+/**
+ * Struct to contain information about the intermediate tensors in the
+ * runner function
+ */
+struct StorageInfo {
+  /*! \brief storage integer identifier of the particular intermediate buffer */
+  int sid;
+  /*! \brief exact size of the temporary */
+  int size_bytes;
+  /*! \brief device type of the intermediate tensor */
+  int dev_type;
+};
+
 using IntegerArray = Array<Integer>;
 using TargetsMap = std::unordered_map<int, Target>;
+using StorageMap = std::unordered_map<Expr, std::vector<StorageInfo>, runtime::ObjectPtrHash,
+                                      runtime::ObjectPtrEqual>;
 
-class AotReturnSidVisitor : public ExprVisitor {
+/**
+ * This is an on demand allocator for AOT. A new temporary
+ * (storage allocator identifier) is allocated for each operation.
+ */
+class AOTOnDemandAllocator : public ExprVisitor {

Review comment:
       i guess, do we want to add any asserts here to ensure that we don't see top-level nodes other than Constant, CallNode, VarNode, or TupleGetItemNode? what if someone modifies the AOT TIR codegen to add some new pattern but forgets to update this? it seems like something should fail then.

##########
File path: tests/python/relay/aot/test_crt_aot.py
##########
@@ -364,5 +364,27 @@ def test_byoc_utvm(use_calculated_workspaces):
     compile_and_run(mod, input_list, output_list, use_calculated_workspaces)
 
 
+def test_quant_mobilenet_tfl():

Review comment:
       ok so here's my concern with this--it's great to test on real-world test cases, but we are really only certain that this test case exercises the logic path we want it to because, at the time of this PR, we manually traced it through and verified that this model triggers the particular scenario we care about. most ideally there would be a contrived unit test, but it can be tricky to write given the number of moving pieces. so i'm okay with using mobilenet as an end-to-end thing, but i'd like you to add some asserts to this test case that verifies that the logic path you're trying to test (e.g. avoiding reuse of the output buffer) is actually triggered, such that if someone modifies AOTExecutorCodegen in the future, this test would fail if that is no longer true.

##########
File path: src/tir/transforms/storage_rewrite.cc
##########
@@ -138,6 +138,34 @@ class LinearAccessPatternFinder final : public StmtExprVisitor {
     if (op->op.same_as(builtin::address_of())) {
       const LoadNode* l = op->args[0].as<LoadNode>();
       this->VisitExpr(l->index);
+    } else if (op->op.same_as(builtin::tvm_call_cpacked())) {
+      // Recall that the arguments of a tvm_call_cpacked are passed as
+      // TVMValues. But a TVMValue is only a container, that points to
+      // a real buffer previously allocated. We need to signal that those
+      // buffers need to be live at the same time (i.e., cannot be overridden)
+      Array<PrimExpr> args = op->args;
+      for (auto arg : args) {
+        const VarNode* var = arg.as<VarNode>();
+        if (value_to_alloc_.find(var) != value_to_alloc_.end()) {
+          auto allocs = value_to_alloc_[var];
+          for (const VarNode* alloc : allocs) {
+            VisitExpr_(alloc);
+          }
+        } else {
+          this->VisitExpr(arg);
+        }
+      }
+    } else if (op->op.same_as(builtin::tvm_struct_set())) {

Review comment:
       i guess another way to view this though is that this PR is adding reference tracking, but reference tracking only works with one particular TIR pattern. i agree with you that adding full reference tracking is a lot to ask particularly considering the goals of this PR. However, there's no documentation (by which I mean no unit test) as to what specifically needs to be tracked. can you add a unit test for the changes you've made to StorageRewrite?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] areusch commented on pull request #8096: Decoupling AOT from graph memory planner

Posted by GitBox <gi...@apache.org>.

areusch commented on pull request #8096:
URL: https://github.com/apache/tvm/pull/8096#issuecomment-869882513


   hey @giuseros, i think @jroesch was hoping with #8297 that you could re-use that common StorageInfo struct for your purposes here. is it possible to do that?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] giuseros commented on pull request #8096: Decoupling AOT from graph memory planner

Posted by GitBox <gi...@apache.org>.

giuseros commented on pull request #8096:
URL: https://github.com/apache/tvm/pull/8096#issuecomment-854058578


   Hi @areusch , @manupa-arm ,
   I was wondering if there was anything else you may want me to address on this PR. 
   
   Thanks!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] u99127 commented on pull request #8096: Decoupling AOT from graph memory planner

Posted by GitBox <gi...@apache.org>.

u99127 commented on pull request #8096:
URL: https://github.com/apache/tvm/pull/8096#issuecomment-845886007


   > thanks @giuseros for working on this! I tested it and it solves the issue that I had. So I close the issue assuming this will merge sometimes soon.
   
   I think we should close issues when things merge not before that actually happens.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] manupa-arm commented on a change in pull request #8096: Decoupling AOT from graph memory planner

Posted by GitBox <gi...@apache.org>.

manupa-arm commented on a change in pull request #8096:
URL: https://github.com/apache/tvm/pull/8096#discussion_r659942293



##########
File path: src/tir/transforms/legalize_packed_calls.cc
##########
@@ -60,30 +60,41 @@ class PackedCallLegalizer : public StmtExprMutator {
     if (call) {
       if (call->op.same_as(builtin::tvm_call_cpacked())) {
         Array<PrimExpr> packed_args{call->args[0]};
+        std::vector<tir::Var> tvm_values;
         for (unsigned i = 1; i < call->args.size(); i++) {
           // No need to pack inputs of the prim_func
           if (inputs_[call->args[i]] == true) {
             packed_args.push_back(call->args[i]);
           } else {
             // Pack the argument inside a TVMValue
-            auto sid_array = tir::Var("tvm_value", DataType::Handle());
-            tir::Stmt set_struct_stmt = tir::Evaluate(
+            std::stringstream ss;
+            ss << "tvm_value_" << tvm_value_index_++;
+            auto sid_array = tir::Var(ss.str(), DataType::Handle());
+            tvm_values.push_back(sid_array);
+
+            new_stmts.push_back(tir::Evaluate(
                 tvm::tir::Call(DataType::Handle(), tvm::tir::builtin::tvm_struct_set(),
-                               {sid_array, 0, tir::builtin::kArrData, call->args[i]}));
-            new_stmts.push_back(LetStmt(sid_array, StackAlloca("array", 1), set_struct_stmt));
+                               {sid_array, 0, tir::builtin::kArrData, call->args[i]})));
             packed_args.push_back(sid_array);
           }
         }
-        // Finally, evaluate the packed call and return a sequential statement
+        // Evaluate the packed call
         new_stmts.push_back(tir::Evaluate(tir::Call(call->dtype, call->op, packed_args)));
-        return tir::SeqStmt(new_stmts);
+        tir::Stmt call_stmt = tir::SeqStmt(new_stmts);
+
+        // Allocate the TVMValues on the stack and define the variables
+        for (auto v : tvm_values) {
+          call_stmt = LetStmt(v, StackAlloca("array", 1), call_stmt);

Review comment:
       yes, that sounds good! (not sure we want a seperate pool but I can see them pooled to 'a'  workspace buffer)




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] giuseros edited a comment on pull request #8096: Decoupling AOT from graph memory planner

Posted by GitBox <gi...@apache.org>.

giuseros edited a comment on pull request #8096:
URL: https://github.com/apache/tvm/pull/8096#issuecomment-856279409


   Hi @areusch , 
   I applied your changes. 
   * About the storage rewrite changes, I tried to explain more in the code. It's about the fact that in order to call a function, you need to set a struct. The rewriter focuses on the structs (marking them conflicting) but the real allocations are lost (and get reused even when they don't need to be). If you have more questions, feel free to ask. 
   * The point about testing is that it's very useful for us to test a quantized mobilenet, and coming up with the edge situation that that network is testing is non-trivial. So I saw having that network test as a two-bird-one-stone situation. 
   * About the unified static memory planner, yes. This change will be incorporated by the planner that will replace the storage rewrite pass. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] giuseros commented on pull request #8096: Decoupling AOT from graph memory planner

Posted by GitBox <gi...@apache.org>.

giuseros commented on pull request #8096:
URL: https://github.com/apache/tvm/pull/8096#issuecomment-856279409


   Hi @areusch , 
   I applied your changes. 
   * About the storage rewrite changes, I tried to explain more in the code. It's about the fact that in order to call a function, you need to set a struct. The rewriter focuses on the struct (marking them conflicting) but the real allocations are lost (and get reused). If you have more questions, feel free to ask. 
   * The point about testing is that it's very useful for us to test a quantized mobilenet, and coming up with the edge situation that that network is testing is non-trivial. So I saw having that network test as a two-bird-one-stone situation. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] giuseros commented on a change in pull request #8096: Decoupling AOT from graph memory planner

Posted by GitBox <gi...@apache.org>.

giuseros commented on a change in pull request #8096:
URL: https://github.com/apache/tvm/pull/8096#discussion_r659914709



##########
File path: src/tir/transforms/legalize_packed_calls.cc
##########
@@ -60,30 +60,41 @@ class PackedCallLegalizer : public StmtExprMutator {
     if (call) {
       if (call->op.same_as(builtin::tvm_call_cpacked())) {
         Array<PrimExpr> packed_args{call->args[0]};
+        std::vector<tir::Var> tvm_values;
         for (unsigned i = 1; i < call->args.size(); i++) {
           // No need to pack inputs of the prim_func
           if (inputs_[call->args[i]] == true) {
             packed_args.push_back(call->args[i]);
           } else {
             // Pack the argument inside a TVMValue
-            auto sid_array = tir::Var("tvm_value", DataType::Handle());
-            tir::Stmt set_struct_stmt = tir::Evaluate(
+            std::stringstream ss;
+            ss << "tvm_value_" << tvm_value_index_++;
+            auto sid_array = tir::Var(ss.str(), DataType::Handle());
+            tvm_values.push_back(sid_array);
+
+            new_stmts.push_back(tir::Evaluate(
                 tvm::tir::Call(DataType::Handle(), tvm::tir::builtin::tvm_struct_set(),
-                               {sid_array, 0, tir::builtin::kArrData, call->args[i]}));
-            new_stmts.push_back(LetStmt(sid_array, StackAlloca("array", 1), set_struct_stmt));
+                               {sid_array, 0, tir::builtin::kArrData, call->args[i]})));
             packed_args.push_back(sid_array);
           }
         }
-        // Finally, evaluate the packed call and return a sequential statement
+        // Evaluate the packed call
         new_stmts.push_back(tir::Evaluate(tir::Call(call->dtype, call->op, packed_args)));
-        return tir::SeqStmt(new_stmts);
+        tir::Stmt call_stmt = tir::SeqStmt(new_stmts);
+
+        // Allocate the TVMValues on the stack and define the variables
+        for (auto v : tvm_values) {
+          call_stmt = LetStmt(v, StackAlloca("array", 1), call_stmt);

Review comment:
       Yes, I think it is a very good idea, and I would personally like a similar direction. @manupa-arm, what do you think?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] giuseros commented on pull request #8096: Decoupling AOT from graph memory planner

Posted by GitBox <gi...@apache.org>.

giuseros commented on pull request #8096:
URL: https://github.com/apache/tvm/pull/8096#issuecomment-858730951


   Hi @areusch , 
   Any more thoughts on this?
   
   Thanks,
   Giuseppe


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] giuseros commented on pull request #8096: Decoupling AOT from graph memory planner

Posted by GitBox <gi...@apache.org>.

giuseros commented on pull request #8096:
URL: https://github.com/apache/tvm/pull/8096#issuecomment-866374368


   Hi @manupa-arm , @areusch , 
   Quick update on this. I added a unit test and slightly changed the pass to address some issues. This should be good enough to be reviewed. 
   
   Thanks,
   Giuseppe


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] areusch edited a comment on pull request #8096: Decoupling AOT from graph memory planner

Posted by GitBox <gi...@apache.org>.

areusch edited a comment on pull request #8096:
URL: https://github.com/apache/tvm/pull/8096#issuecomment-869882513


   hey @giuseros, i think @jroesch was hoping with #8297 that you could re-use that common StorageInfo struct for your purposes here. is it possible to do that? sorry if this was unclear from that PR.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] giuseros commented on pull request #8096: Decoupling AOT from graph memory planner

Posted by GitBox <gi...@apache.org>.

giuseros commented on pull request #8096:
URL: https://github.com/apache/tvm/pull/8096#issuecomment-860715052


   Hi @areusch , 
   After some thinking we came up with a different solution. The way we are doing things now is the following:
   pass-a) Compose the `main_func` with `tvm_set_struct`, i.e., codegen ready
   pass-b) Storage rewrite modified to take care of the structs
   pass-c) tvm::build
   A possible alternative can be the following:
   pass-a2) Compose the `main_func` without `tvm_set_struct`. This means that the packed calls will receive raw pointers. This in turn means that the `main_func` in TIR is not ready to be code generated
   pass-b2) Storage rewrite without any change. This is possible now since we are not using `tvm_set_struct` yet
   pass-c2) transform the packed calls inputs by using `tvm_set_struct`. We can actually avoid this pass if we are using unpacked signatures. 
   
   With pipilene-2 we can leave `StorageRewrite` unchanged and still get away with the issues that `tvm_set_struct` gives us. What do you think?
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] jroesch commented on pull request #8096: Decoupling AOT from graph memory planner

Posted by GitBox <gi...@apache.org>.

jroesch commented on pull request #8096:
URL: https://github.com/apache/tvm/pull/8096#issuecomment-870097981


   @giuseros I would prefer to also use an array of structs, but my goal is to just get us to use the same data structure everywhere since there are multiple places where we are storing different versions of the same data without a shared data structure. We could move the Array outside and simplify the struct but it would be good to move memory planning to also use the same structure. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] giuseros commented on pull request #8096: Decoupling AOT from graph memory planner

Posted by GitBox <gi...@apache.org>.

giuseros commented on pull request #8096:
URL: https://github.com/apache/tvm/pull/8096#issuecomment-867704948


   A friendly ping @manupa-arm , @jroesch , @areusch !


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] giuseros edited a comment on pull request #8096: Decoupling AOT from graph memory planner

Posted by GitBox <gi...@apache.org>.

giuseros edited a comment on pull request #8096:
URL: https://github.com/apache/tvm/pull/8096#issuecomment-856279409


   Hi @areusch , 
   I applied your changes. 
   * About the storage rewrite changes, I tried to explain more in the code. It's about the fact that in order to call a function, you need to set a struct. The rewriter focuses on the structs (marking them conflicting) but the real allocations are lost (and get reused). If you have more questions, feel free to ask. 
   * The point about testing is that it's very useful for us to test a quantized mobilenet, and coming up with the edge situation that that network is testing is non-trivial. So I saw having that network test as a two-bird-one-stone situation. 
   * About the unified static memory planner, yes. This change will be incorporated by the planner that will replace the storage rewrite pass. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] areusch commented on pull request #8096: Decoupling AOT from graph memory planner

Posted by GitBox <gi...@apache.org>.

areusch commented on pull request #8096:
URL: https://github.com/apache/tvm/pull/8096#issuecomment-869882513


   hey @giuseros, i think @jroesch was hoping with #8297 that you could re-use that common StorageInfo struct for your purposes here. is it possible to do that?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] mehrdadh commented on pull request #8096: Decoupling AOT from graph memory planner

Posted by GitBox <gi...@apache.org>.

mehrdadh commented on pull request #8096:
URL: https://github.com/apache/tvm/pull/8096#issuecomment-856101406


   @giuseros there is a lint issue due to black update. You can run "docker/lint.sh python_format" to see the error locally.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] areusch merged pull request #8096: Decoupling AOT from graph memory planner

Posted by GitBox <gi...@apache.org>.

areusch merged pull request #8096:
URL: https://github.com/apache/tvm/pull/8096


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] giuseros commented on pull request #8096: Decoupling AOT from graph memory planner

Posted by GitBox <gi...@apache.org>.

giuseros commented on pull request #8096:
URL: https://github.com/apache/tvm/pull/8096#issuecomment-859519657

Hi @areusch,
Thanks for your reply. I think what you are saying is can be summarized in:
a) The StorageRewrite pass does not solve our problem because it can still touch the output buffers (which are parameters of the `tir PrimFunc`)
b) The solution that we have put in place in `StorageRewrite` does not address the reference counting in an general way.

About a), I disagree. `StorageRewrite` only rewrites `tir.Allocate` nodes, and the output is not allocated in our main runner function. This is why if you run mobilenet quantized with the `StorageRewrite` pass, it works (and it doesn't work with Graph Memory Planning). So `StorageRewrite` is a perfectly good candidate to solve the issue we have in #8062 . The problem with `StorageRewrite `is that it doesn't handle `tvm_struct_set` correctly, and this brings us to point b)

About b), I agree. The solution we are proposing is not general, but any general solution can be built on top of it. As you said, we are trying to solve a specific case, but when you tackle the general case, the code I wrote is still valid. Especially in a TIR based world (which is the aim of TensorIR, afaiu) I think we will want (at some point) to extend this solution instead of accepting that `StorageRewrite` does not work with some TIR built-ins.

In other words, our solution is not tailored to AOT at all. It is a partial solution to a generic problem that (only) AOT is hitting. It cannot break anything in the future, because either the user is doing what AOT is doing, and it will work. For more edge cases, `StorageRewrite` won't work as it didn't use to work before.

Bear also in mind that until we have a static memory planner, without this solution AOT is not usable. If you want, we can try to write a test that stresses the problem we are trying to solve with the change in `StorageRewrite`.

What do you think?

--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] giuseros edited a comment on pull request #8096: Decoupling AOT from graph memory planner

Posted by GitBox <gi...@apache.org>.

giuseros edited a comment on pull request #8096:
URL: https://github.com/apache/tvm/pull/8096#issuecomment-860715052


   Hi @areusch , 
   After some thinking we came up with a different solution. The way we are doing things now is the following:
   pass-a) Compose the `main_func` with `tvm_set_struct`, i.e., codegen ready
   pass-b) Storage rewrite modified to take care of the structs
   pass-c) `tvm::build`
   
   A possible alternative can be the following:
   pass-a2) Compose the `main_func` without `tvm_set_struct`. This means that the packed calls will receive raw pointers. This in turn means that the `main_func` in TIR is not ready to be code generated
   pass-b2) Storage rewrite without any change. This is possible now since we are not using `tvm_set_struct` yet
   pass-c2) transform the packed calls inputs by using `tvm_set_struct`. We can actually avoid this pass if we are using unpacked signatures. 
   pass-d2) `tvm::build`
   
   With pipilene-2 we can leave `StorageRewrite` unchanged and still address the issues that `tvm_set_struct` gives us. What do you think?
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] manupa-arm commented on pull request #8096: Decoupling AOT from graph memory planner

Posted by GitBox <gi...@apache.org>.

manupa-arm commented on pull request #8096:
URL: https://github.com/apache/tvm/pull/8096#issuecomment-863428431


   Yes, the overall plan, we'd not like to run StorageRewrite before the USMP passes are run because it will perform non-optimal local sharing inside the primfunc. The plan is to move StorageRewrite pass down after USMP passes if there is anything left out StorageRewrite could help with.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] giuseros edited a comment on pull request #8096: Decoupling AOT from graph memory planner

Posted by GitBox <gi...@apache.org>.

giuseros edited a comment on pull request #8096:
URL: https://github.com/apache/tvm/pull/8096#issuecomment-859519657

In other words, our solution is not tailored to AOT at all. It is a partial solution to a generic problem that (only) AOT is hitting (for now). It cannot break anything in the future, because either the user is doing what AOT is doing, and it will work. For more edge cases, `StorageRewrite` won't work as it didn't use to work before.

What do you think?

--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] giuseros edited a comment on pull request #8096: Decoupling AOT from graph memory planner

Posted by GitBox <gi...@apache.org>.

giuseros edited a comment on pull request #8096:
URL: https://github.com/apache/tvm/pull/8096#issuecomment-865367080


   Hi @manupa-arm , @areusch , 
   Here is a second attempt to resolve this. I basically applied what we discussed. I also cleaned a bit the TIR produced by removing all the `tir::Evaluate(0)`  statements. Please, let me know what you guys think!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] giuseros commented on pull request #8096: Decoupling AOT from graph memory planner

Posted by GitBox <gi...@apache.org>.

giuseros commented on pull request #8096:
URL: https://github.com/apache/tvm/pull/8096#issuecomment-865367080


   Hi @manupa-arm , @areusch , 
   Here is a second attempt to resolve this. I basically applied what we discussed. I also cleaned a bit the TIR produced by removing all the `tir::Evaluate(0)` . Please, let me know what you guys think!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] giuseros commented on a change in pull request #8096: Decoupling AOT from graph memory planner

Posted by GitBox <gi...@apache.org>.

giuseros commented on a change in pull request #8096:
URL: https://github.com/apache/tvm/pull/8096#discussion_r646944601



##########
File path: src/tir/transforms/storage_rewrite.cc
##########
@@ -138,6 +138,34 @@ class LinearAccessPatternFinder final : public StmtExprVisitor {
     if (op->op.same_as(builtin::address_of())) {
       const LoadNode* l = op->args[0].as<LoadNode>();
       this->VisitExpr(l->index);
+    } else if (op->op.same_as(builtin::tvm_call_cpacked())) {
+      // Recall that the arguments of a tvm_call_cpacked are passed as
+      // TVMValues. But a TVMValue is only a container, that points to
+      // a real buffer previously allocated. We need to signal that those
+      // buffers need to be live at the same time (i.e., cannot be overridden)
+      Array<PrimExpr> args = op->args;
+      for (auto arg : args) {
+        const VarNode* var = arg.as<VarNode>();
+        if (value_to_alloc_.find(var) != value_to_alloc_.end()) {
+          auto allocs = value_to_alloc_[var];
+          for (const VarNode* alloc : allocs) {
+            VisitExpr_(alloc);
+          }
+        } else {
+          this->VisitExpr(arg);
+        }
+      }
+    } else if (op->op.same_as(builtin::tvm_struct_set())) {

Review comment:
       So, for called_packed, we don't care about accessing the data (`tvm_struct_get`), but only setting the data of the TVMValue. But  in general it is true that other patterns might be problematic if used. 
   
   I am not sure about `tvm_struct_get` in particular, because you would use the (real) data you extracted, while with `tvm_struct_set` you  use the real data to set the structure and then you pass the structure in the call. This change is only to annotate that the struct is a mere container, and that the rewriter needs to focus on the real data. 
   
   I would say that this PR is not trying to fix the rewriter on all the possible corner cases. This particular corner case is affecting AOT, and it needs to be fixed. If there are other corner cases, I would fix them in a separate PR. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] giuseros edited a comment on pull request #8096: Decoupling AOT from graph memory planner

Posted by GitBox <gi...@apache.org>.

giuseros edited a comment on pull request #8096:
URL: https://github.com/apache/tvm/pull/8096#issuecomment-865367080


   Hi @manupa-arm , @areusch , 
   Here is a second attempt to resolve this. I basically applied what we discussed. I also cleaned a bit the TIR produced by removing all the `tir::Evaluate(0)`  statements. Please, let me know what you guys think!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] giuseros edited a comment on pull request #8096: Decoupling AOT from graph memory planner

Posted by GitBox <gi...@apache.org>.

giuseros edited a comment on pull request #8096:
URL: https://github.com/apache/tvm/pull/8096#issuecomment-860715052


   Hi @areusch , 
   After some thinking we came up with a different solution. The way we are doing things now is the following:
   pass-a) Compose the `main_func` with `tvm_set_struct`, i.e., codegen ready
   pass-b) Storage rewrite modified to take care of the structs
   pass-c) tvm::build
   
   A possible alternative can be the following:
   pass-a2) Compose the `main_func` without `tvm_set_struct`. This means that the packed calls will receive raw pointers. This in turn means that the `main_func` in TIR is not ready to be code generated
   pass-b2) Storage rewrite without any change. This is possible now since we are not using `tvm_set_struct` yet
   pass-c2) transform the packed calls inputs by using `tvm_set_struct`. We can actually avoid this pass if we are using unpacked signatures. 
   
   With pipilene-2 we can leave `StorageRewrite` unchanged and still get away with the issues that `tvm_set_struct` gives us. What do you think?
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] mehrdadh commented on pull request #8096: Decoupling AOT from graph memory planner

Posted by GitBox <gi...@apache.org>.

mehrdadh commented on pull request #8096:
URL: https://github.com/apache/tvm/pull/8096#issuecomment-845372478


   thanks @giuseros for working on this! I tested it and it solves the issue that I had. So I close the issue assuming this will merge sometimes soon.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] giuseros edited a comment on pull request #8096: Decoupling AOT from graph memory planner

Posted by GitBox <gi...@apache.org>.

giuseros edited a comment on pull request #8096:
URL: https://github.com/apache/tvm/pull/8096#issuecomment-856279409


   Hi @areusch , 
   I applied your changes. 
   * About the storage rewrite changes, I tried to explain more in the code. It's about the fact that in order to call a function, you need to set a struct. The rewriter focuses on the struct (marking them conflicting) but the real allocations are lost (and get reused). If you have more questions, feel free to ask. 
   * The point about testing is that it's very useful for us to test a quantized mobilenet, and coming up with the edge situation that that network is testing is non-trivial. So I saw having that network test as a two-bird-one-stone situation. 
   * About the unified static memory planner, yes. This change will be incorporated by the planner that will replace the storage rewrite pass. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] giuseros commented on pull request #8096: Decoupling AOT from graph memory planner

Posted by GitBox <gi...@apache.org>.

giuseros commented on pull request #8096:
URL: https://github.com/apache/tvm/pull/8096#issuecomment-848653432


   Hi @areusch , @manupa-arm , @mehrdadh ,
   Quick update on this. We found an issue in the storage rewrite pass. The problem is due to the fact that the packed functions accept TVMValues and not bare pointers. This means that in this TIR:
    ```
      let sid_5_value_1: handle = @tir.tvm_stack_alloca("array", 1, dtype=handle)
       @tir.tvm_struct_set(sid_5_value_1, 0, 1, sid_5, dtype=handle)
       let sid_4_value: handle = @tir.tvm_stack_alloca("array", 1, dtype=handle)
       @tir.tvm_struct_set(sid_4_value, 0, 1, sid_4, dtype=handle)
        {
         @tir.tvm_call_cpacked("fused_transpose", sid_5_value_1, sid_4_value, dtype=int32)
       }
   ```
   The rewriter is able to identify `sid_5_value_1` and `sid_4_value` as both live variables, and they are not overridden. But it fails to see that those variable relate to `sid_5` and `sid_4` (respectively).
   
   Early on, in the  calls: 
   ```
   @tir.tvm_struct_set(sid_5_value_1, 0, 1, sid_5, dtype=handle)
   @tir.tvm_struct_set(sid_4_value, 0, 1, sid_4, dtype=handle)
   ```
   The variables `sid_5` and `sid_4` are treated as separate variables (that are NOT live at the same time) and hence get overridden. In the last commit I explicitly create a translation table between the TVMValues and the allocations. This seems to fix the issue.  I have added also a further test case `test_transpose` (disabling the op-fusion in Relay) to test this scenario. 
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] jroesch edited a comment on pull request #8096: Decoupling AOT from graph memory planner

Posted by GitBox <gi...@apache.org>.

jroesch edited a comment on pull request #8096:
URL: https://github.com/apache/tvm/pull/8096#issuecomment-870097981


   @giuseros I would prefer to also use an array of structs, but my goal is to just get us to use the same data structure everywhere since there are multiple places where we are storing different versions of the same data without a shared data structure. We could move the Array outside and simplify the struct but it would be good to move memory planning to also use the same structure. 
   
   I guess one solution could be merge as is, I can rewrite the MemoryPlanning to use the Array of struct and then you can update to use the same structure. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] giuseros commented on pull request #8096: Decoupling AOT from graph memory planner

Posted by GitBox <gi...@apache.org>.

giuseros commented on pull request #8096:
URL: https://github.com/apache/tvm/pull/8096#issuecomment-856117671


   Done! Thanks @mehrdadh 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] manupa-arm commented on a change in pull request #8096: Decoupling AOT from graph memory planner

Posted by GitBox <gi...@apache.org>.

manupa-arm commented on a change in pull request #8096:
URL: https://github.com/apache/tvm/pull/8096#discussion_r637038120



##########
File path: src/relay/backend/aot_executor_codegen.cc
##########
@@ -390,8 +518,8 @@ class AOTExecutorCodegen : public ExprVisitor {
     }
 
     ICHECK_GE(storage_device_map_.count(expr), 0);
-    auto& device_type = storage_device_map_[expr][1];
-    auto call_dev_type = device_type[0]->value;
+    auto& device_type = storage_device_map_[expr].dev_types;
+    auto call_dev_type = device_type[0];

Review comment:
       I cant seem to find where this is used. Can you point me to it ?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] jroesch edited a comment on pull request #8096: Decoupling AOT from graph memory planner

Posted by GitBox <gi...@apache.org>.

jroesch edited a comment on pull request #8096:
URL: https://github.com/apache/tvm/pull/8096#issuecomment-870097981


   @giuseros I would prefer to also use an array of structs, but my goal is to just get us to use the same data structure everywhere since there are multiple places where we are storing different versions of the same data without a shared data structure. We could move the Array outside and simplify the struct but it would be good to move memory planning to also use the same structure. 
   
   I guess one solution could be merge as is, I can rewrite the MemoryPlanning to use the Array of struct and then you can update to use the same structure. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] giuseros edited a comment on pull request #8096: Decoupling AOT from graph memory planner

Posted by GitBox <gi...@apache.org>.

giuseros edited a comment on pull request #8096:
URL: https://github.com/apache/tvm/pull/8096#issuecomment-860715052


   Hi @areusch , 
   After some thinking we came up with a different solution. The way we are doing things now is the following:
   pass-a) Compose the `main_func` with `tvm_set_struct`, i.e., codegen ready
   pass-b) Storage rewrite modified to take care of the structs
   pass-c) `tvm::build`
   
   A possible alternative can be the following:
   pass-a2) Compose the `main_func` without `tvm_set_struct`. This means that the packed calls will receive raw pointers. This in turn means that the `main_func` in TIR is not ready to be code generated
   pass-b2) Storage rewrite without any change. This is possible now since we are not using `tvm_set_struct` yet
   pass-c2) transform the packed calls inputs by using `tvm_set_struct`. We can actually avoid this pass if we are using unpacked signatures. 
   pass-d2) `tvm::build`
   
   With pipilene-2 we can leave `StorageRewrite` unchanged and still get away with the issues that `tvm_set_struct` gives us. What do you think?
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] giuseros edited a comment on pull request #8096: Decoupling AOT from graph memory planner

Posted by GitBox <gi...@apache.org>.

giuseros edited a comment on pull request #8096:
URL: https://github.com/apache/tvm/pull/8096#issuecomment-860715052


   Hi @areusch , 
   After some thinking we came up with a different solution. The way we are doing things now is the following:
   pass-a) Compose the `main_func` with `tvm_set_struct`, i.e., codegen ready
   pass-b) Storage rewrite modified to take care of the structs
   pass-c) `tvm::build`
   
   A possible alternative can be the following:
   pass-a2) Compose the `main_func` without `tvm_set_struct`. This means that the packed calls will receive raw pointers. This in turn means that the `main_func` in TIR is not ready to be code generated
   pass-b2) Storage rewrite without any change. This is possible now since we are not using `tvm_set_struct` yet
   pass-c2) transform the packed calls inputs by using `tvm_set_struct`. We can actually avoid this pass if we are using unpacked signatures. 
   padd-d2) `tvm::build`
   
   With pipilene-2 we can leave `StorageRewrite` unchanged and still get away with the issues that `tvm_set_struct` gives us. What do you think?
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] manupa-arm commented on a change in pull request #8096: Decoupling AOT from graph memory planner

Posted by GitBox <gi...@apache.org>.

manupa-arm commented on a change in pull request #8096:
URL: https://github.com/apache/tvm/pull/8096#discussion_r636732925



##########
File path: src/relay/backend/aot_executor_codegen.cc
##########
@@ -46,13 +47,171 @@ namespace backend {
 
 using IntegerArray = Array<Integer>;
 using TargetsMap = std::unordered_map<int, Target>;
+using StorageMap = std::unordered_map<Expr, std::vector<std::vector<int>>, runtime::ObjectPtrHash,
+                                      runtime::ObjectPtrEqual>;
+/**
+ * This is an on demand allocator for AOT. A new temporary
+ * (storage allocator identifier) is allocated for each operation.
+ */
+class AOTOnDemandAllocator : public ExprVisitor {
+ public:
+  // run the visitor on a function.
+  StorageMap Run(const Function& func) {
+    node_device_map_ = CollectDeviceInfo(func);
+
+    for (Expr param : func->params) {
+      CreateSid(param.operator->());
+    }
+
+    GetSid(func->body);
+    return storage_device_map_;
+  }
+
+  void VisitExpr_(const ConstantNode* op) final { CreateSid(op); }
+
+  void VisitExpr_(const CallNode* op) final {
+    // create token for the call node.
+    CreateSid(op);
+    for (Expr arg : op->args) {
+      GetSid(arg);
+    }
+  }
+
+  void VisitExpr_(const VarNode* op) final {
+    // Do nothing.
+  }
+
+  void VisitExpr_(const FunctionNode* op) final {
+    // do not recurse into sub function.
+  }
+
+  void VisitExpr_(const GlobalVarNode* op) final {
+    // Do nothing.
+  }
+
+  void VisitExpr_(const OpNode* op) final {
+    // Do nothing.
+  }
+
+  void VisitExpr_(const TupleNode* op) final {
+    std::vector<int> field_ids;
+    std::vector<int> field_sizes;
+    std::vector<int> field_types;
+    Expr expr = GetRef<Expr>(op);
+    for (Expr field : op->fields) {
+      auto sids = GetSid(field);
+      field_ids.insert(field_ids.end(), sids[0].begin(), sids[0].end());
+      field_types.insert(field_types.end(), sids[1].begin(), sids[1].end());
+      field_sizes.insert(field_sizes.end(), sids[2].begin(), sids[2].end());
+    }
+    if (storage_device_map_[expr].empty()) {
+      InitStorage(expr);
+    }
+    storage_device_map_[expr][0] = field_ids;
+    storage_device_map_[expr][1] = field_sizes;
+    storage_device_map_[expr][2] = field_types;
+  }
+
+  void VisitExpr_(const TupleGetItemNode* op) final {
+    Expr expr = GetRef<Expr>(op);
+    const auto& sids = GetSid(op->tuple);
+    ICHECK_LT(static_cast<size_t>(op->index), sids.size());
+    if (storage_device_map_[expr].empty()) {
+      InitStorage(expr);
+    }
+    storage_device_map_[expr][0] = {sids[0][op->index]};
+    storage_device_map_[expr][1] = {sids[1][op->index]};
+    storage_device_map_[expr][2] = {sids[2][op->index]};
+  }
+
+  void VisitExpr_(const IfNode* op) final { LOG(FATAL) << "if is not supported."; }
+
+  void VisitExpr_(const LetNode* op) final { LOG(FATAL) << "if is not supported."; }
+
+ private:
+  /*!
+   * \brief ceil(size/word_size) to get number of words.
+   * \param size The original size.
+   * \param word_size The element size.
+   */
+  static size_t DivRoundUp(size_t size, size_t word_size) {
+    return (size + word_size - 1) / word_size;
+  }
+  /*!
+   * \brief Get the memory requirement.
+   * \param prototype The prototype token.
+   * \return The required memory size.
+   */
+  size_t GetMemorySize(const TensorTypeNode* ttype) {
+    ICHECK(ttype != nullptr);
+    size_t size = 1;
+    for (IndexExpr dim : ttype->shape) {
+      const int64_t* pval = tir::as_const_int(dim);
+      ICHECK(pval != nullptr) << "Cannot allocate memory symbolic tensor shape " << ttype->shape;
+      ICHECK_GE(*pval, 0) << "Cannot allocate memory for tensor with negative shape" << *pval;
+      size *= static_cast<size_t>(pval[0]);
+    }
+    size *= DivRoundUp(ttype->dtype.bits() * ttype->dtype.lanes(), 8);
+    return size;
+  }
+  /*!
+   * \brief Get the necessary token.
+   * \param expr The expression.
+   * \return The corresponding token.
+   */
+  std::vector<std::vector<int>> GetSid(const Expr& expr) {
+    this->VisitExpr(expr);
+    auto it = storage_device_map_.find(expr);
+    ICHECK(it != storage_device_map_.end());
+    return it->second;
+  }
+
+  void CreateSid(const ExprNode* op) {
+    std::vector<int> sids;
+    std::vector<int> sizes;
+    std::vector<int> types;
+    Expr expr = GetRef<Expr>(op);
+    int device_type = node_device_map_.count(GetRef<Expr>(op)) ? node_device_map_[expr]->value : 0;
+    if (const auto* tuple_type = op->checked_type().as<TupleTypeNode>()) {
+      for (Type t : tuple_type->fields) {
+        const auto* ttype = t.as<TensorTypeNode>();
+        ICHECK(ttype);
+        sids.push_back(sid_++);
+        types.push_back(device_type);
+        sizes.push_back(GetMemorySize(ttype));
+      }
+    } else {
+      const auto* ttype = op->checked_type().as<TensorTypeNode>();
+      ICHECK(ttype);
+      sids.push_back(sid_++);
+      types.push_back(device_type);
+      sizes.push_back(GetMemorySize(ttype));
+    }
+    InitStorage(expr);
+    storage_device_map_[expr][0] = sids;
+    storage_device_map_[expr][1] = types;
+    storage_device_map_[expr][2] = sizes;

Review comment:
       This seems inconsistent with L110.
   Maybe its better to have a struct describing the attributes rather than a vector. WDYT?

##########
File path: src/relay/backend/aot_executor_codegen.cc
##########
@@ -46,13 +47,171 @@ namespace backend {
 
 using IntegerArray = Array<Integer>;
 using TargetsMap = std::unordered_map<int, Target>;
+using StorageMap = std::unordered_map<Expr, std::vector<std::vector<int>>, runtime::ObjectPtrHash,
+                                      runtime::ObjectPtrEqual>;
+/**
+ * This is an on demand allocator for AOT. A new temporary
+ * (storage allocator identifier) is allocated for each operation.
+ */
+class AOTOnDemandAllocator : public ExprVisitor {
+ public:
+  // run the visitor on a function.
+  StorageMap Run(const Function& func) {
+    node_device_map_ = CollectDeviceInfo(func);
+
+    for (Expr param : func->params) {
+      CreateSid(param.operator->());
+    }
+
+    GetSid(func->body);
+    return storage_device_map_;
+  }
+
+  void VisitExpr_(const ConstantNode* op) final { CreateSid(op); }
+
+  void VisitExpr_(const CallNode* op) final {
+    // create token for the call node.
+    CreateSid(op);
+    for (Expr arg : op->args) {
+      GetSid(arg);
+    }
+  }
+
+  void VisitExpr_(const VarNode* op) final {
+    // Do nothing.
+  }
+
+  void VisitExpr_(const FunctionNode* op) final {
+    // do not recurse into sub function.
+  }
+
+  void VisitExpr_(const GlobalVarNode* op) final {
+    // Do nothing.
+  }
+
+  void VisitExpr_(const OpNode* op) final {
+    // Do nothing.
+  }
+
+  void VisitExpr_(const TupleNode* op) final {
+    std::vector<int> field_ids;
+    std::vector<int> field_sizes;
+    std::vector<int> field_types;
+    Expr expr = GetRef<Expr>(op);
+    for (Expr field : op->fields) {
+      auto sids = GetSid(field);
+      field_ids.insert(field_ids.end(), sids[0].begin(), sids[0].end());
+      field_types.insert(field_types.end(), sids[1].begin(), sids[1].end());
+      field_sizes.insert(field_sizes.end(), sids[2].begin(), sids[2].end());
+    }
+    if (storage_device_map_[expr].empty()) {
+      InitStorage(expr);
+    }
+    storage_device_map_[expr][0] = field_ids;
+    storage_device_map_[expr][1] = field_sizes;
+    storage_device_map_[expr][2] = field_types;

Review comment:
       Is this device type ? (I am wondering why storage_device_map_ does not have any device)

##########
File path: src/relay/backend/aot_executor_codegen.cc
##########
@@ -46,13 +47,171 @@ namespace backend {
 
 using IntegerArray = Array<Integer>;
 using TargetsMap = std::unordered_map<int, Target>;
+using StorageMap = std::unordered_map<Expr, std::vector<std::vector<int>>, runtime::ObjectPtrHash,

Review comment:
       what does std::vector<std::vector<int>> represent? May be better to describe what they are?

##########
File path: src/relay/backend/aot_executor_codegen.cc
##########
@@ -46,13 +47,171 @@ namespace backend {
 
 using IntegerArray = Array<Integer>;
 using TargetsMap = std::unordered_map<int, Target>;
+using StorageMap = std::unordered_map<Expr, std::vector<std::vector<int>>, runtime::ObjectPtrHash,
+                                      runtime::ObjectPtrEqual>;
+/**
+ * This is an on demand allocator for AOT. A new temporary
+ * (storage allocator identifier) is allocated for each operation.
+ */
+class AOTOnDemandAllocator : public ExprVisitor {
+ public:
+  // run the visitor on a function.
+  StorageMap Run(const Function& func) {
+    node_device_map_ = CollectDeviceInfo(func);
+
+    for (Expr param : func->params) {
+      CreateSid(param.operator->());
+    }
+
+    GetSid(func->body);
+    return storage_device_map_;
+  }
+
+  void VisitExpr_(const ConstantNode* op) final { CreateSid(op); }
+
+  void VisitExpr_(const CallNode* op) final {
+    // create token for the call node.
+    CreateSid(op);
+    for (Expr arg : op->args) {
+      GetSid(arg);
+    }
+  }
+
+  void VisitExpr_(const VarNode* op) final {
+    // Do nothing.
+  }
+
+  void VisitExpr_(const FunctionNode* op) final {
+    // do not recurse into sub function.
+  }
+
+  void VisitExpr_(const GlobalVarNode* op) final {
+    // Do nothing.
+  }
+
+  void VisitExpr_(const OpNode* op) final {
+    // Do nothing.
+  }
+
+  void VisitExpr_(const TupleNode* op) final {
+    std::vector<int> field_ids;
+    std::vector<int> field_sizes;
+    std::vector<int> field_types;
+    Expr expr = GetRef<Expr>(op);
+    for (Expr field : op->fields) {
+      auto sids = GetSid(field);
+      field_ids.insert(field_ids.end(), sids[0].begin(), sids[0].end());
+      field_types.insert(field_types.end(), sids[1].begin(), sids[1].end());
+      field_sizes.insert(field_sizes.end(), sids[2].begin(), sids[2].end());
+    }
+    if (storage_device_map_[expr].empty()) {
+      InitStorage(expr);
+    }
+    storage_device_map_[expr][0] = field_ids;
+    storage_device_map_[expr][1] = field_sizes;
+    storage_device_map_[expr][2] = field_types;
+  }
+
+  void VisitExpr_(const TupleGetItemNode* op) final {
+    Expr expr = GetRef<Expr>(op);
+    const auto& sids = GetSid(op->tuple);
+    ICHECK_LT(static_cast<size_t>(op->index), sids.size());
+    if (storage_device_map_[expr].empty()) {
+      InitStorage(expr);
+    }
+    storage_device_map_[expr][0] = {sids[0][op->index]};
+    storage_device_map_[expr][1] = {sids[1][op->index]};
+    storage_device_map_[expr][2] = {sids[2][op->index]};
+  }
+
+  void VisitExpr_(const IfNode* op) final { LOG(FATAL) << "if is not supported."; }
+
+  void VisitExpr_(const LetNode* op) final { LOG(FATAL) << "if is not supported."; }
+
+ private:
+  /*!
+   * \brief ceil(size/word_size) to get number of words.
+   * \param size The original size.
+   * \param word_size The element size.
+   */
+  static size_t DivRoundUp(size_t size, size_t word_size) {
+    return (size + word_size - 1) / word_size;
+  }
+  /*!
+   * \brief Get the memory requirement.
+   * \param prototype The prototype token.
+   * \return The required memory size.
+   */
+  size_t GetMemorySize(const TensorTypeNode* ttype) {
+    ICHECK(ttype != nullptr);
+    size_t size = 1;
+    for (IndexExpr dim : ttype->shape) {
+      const int64_t* pval = tir::as_const_int(dim);
+      ICHECK(pval != nullptr) << "Cannot allocate memory symbolic tensor shape " << ttype->shape;
+      ICHECK_GE(*pval, 0) << "Cannot allocate memory for tensor with negative shape" << *pval;
+      size *= static_cast<size_t>(pval[0]);
+    }
+    size *= DivRoundUp(ttype->dtype.bits() * ttype->dtype.lanes(), 8);
+    return size;
+  }
+  /*!
+   * \brief Get the necessary token.
+   * \param expr The expression.
+   * \return The corresponding token.
+   */
+  std::vector<std::vector<int>> GetSid(const Expr& expr) {
+    this->VisitExpr(expr);
+    auto it = storage_device_map_.find(expr);
+    ICHECK(it != storage_device_map_.end());
+    return it->second;
+  }
+
+  void CreateSid(const ExprNode* op) {
+    std::vector<int> sids;
+    std::vector<int> sizes;
+    std::vector<int> types;
+    Expr expr = GetRef<Expr>(op);
+    int device_type = node_device_map_.count(GetRef<Expr>(op)) ? node_device_map_[expr]->value : 0;
+    if (const auto* tuple_type = op->checked_type().as<TupleTypeNode>()) {
+      for (Type t : tuple_type->fields) {
+        const auto* ttype = t.as<TensorTypeNode>();
+        ICHECK(ttype);
+        sids.push_back(sid_++);
+        types.push_back(device_type);
+        sizes.push_back(GetMemorySize(ttype));
+      }
+    } else {
+      const auto* ttype = op->checked_type().as<TensorTypeNode>();
+      ICHECK(ttype);
+      sids.push_back(sid_++);
+      types.push_back(device_type);
+      sizes.push_back(GetMemorySize(ttype));
+    }
+    InitStorage(expr);
+    storage_device_map_[expr][0] = sids;
+    storage_device_map_[expr][1] = types;
+    storage_device_map_[expr][2] = sizes;
+  }
+
+  void InitStorage(Expr expr) {
+    if (storage_device_map_[expr].empty()) {
+      storage_device_map_[expr].push_back(std::vector<int>());
+      storage_device_map_[expr].push_back(std::vector<int>());
+      storage_device_map_[expr].push_back(std::vector<int>());
+    }
+  }
+
+  StorageMap storage_device_map_;
+  Map<Expr, Integer> node_device_map_;
+  int sid_{0};
+};
 
 class AotReturnSidVisitor : public ExprVisitor {
  public:
-  explicit AotReturnSidVisitor(Map<Expr, Array<IntegerArray>> storage_device_map)
+  explicit AotReturnSidVisitor(StorageMap storage_device_map)

Review comment:
       Its a bit unclear to me purpose of this pass. It might be better to add a brief describing the purpose and how it is used in the code generation.

##########
File path: src/relay/backend/aot_executor_codegen.cc
##########
@@ -46,13 +47,171 @@ namespace backend {
 
 using IntegerArray = Array<Integer>;
 using TargetsMap = std::unordered_map<int, Target>;
+using StorageMap = std::unordered_map<Expr, std::vector<std::vector<int>>, runtime::ObjectPtrHash,
+                                      runtime::ObjectPtrEqual>;
+/**
+ * This is an on demand allocator for AOT. A new temporary
+ * (storage allocator identifier) is allocated for each operation.
+ */
+class AOTOnDemandAllocator : public ExprVisitor {
+ public:
+  // run the visitor on a function.
+  StorageMap Run(const Function& func) {
+    node_device_map_ = CollectDeviceInfo(func);
+
+    for (Expr param : func->params) {
+      CreateSid(param.operator->());
+    }
+
+    GetSid(func->body);
+    return storage_device_map_;
+  }
+
+  void VisitExpr_(const ConstantNode* op) final { CreateSid(op); }
+
+  void VisitExpr_(const CallNode* op) final {
+    // create token for the call node.
+    CreateSid(op);
+    for (Expr arg : op->args) {
+      GetSid(arg);
+    }
+  }
+
+  void VisitExpr_(const VarNode* op) final {
+    // Do nothing.
+  }
+
+  void VisitExpr_(const FunctionNode* op) final {
+    // do not recurse into sub function.
+  }
+
+  void VisitExpr_(const GlobalVarNode* op) final {
+    // Do nothing.
+  }
+
+  void VisitExpr_(const OpNode* op) final {
+    // Do nothing.
+  }
+
+  void VisitExpr_(const TupleNode* op) final {
+    std::vector<int> field_ids;
+    std::vector<int> field_sizes;
+    std::vector<int> field_types;
+    Expr expr = GetRef<Expr>(op);
+    for (Expr field : op->fields) {
+      auto sids = GetSid(field);
+      field_ids.insert(field_ids.end(), sids[0].begin(), sids[0].end());
+      field_types.insert(field_types.end(), sids[1].begin(), sids[1].end());
+      field_sizes.insert(field_sizes.end(), sids[2].begin(), sids[2].end());
+    }
+    if (storage_device_map_[expr].empty()) {
+      InitStorage(expr);
+    }
+    storage_device_map_[expr][0] = field_ids;
+    storage_device_map_[expr][1] = field_sizes;
+    storage_device_map_[expr][2] = field_types;
+  }
+
+  void VisitExpr_(const TupleGetItemNode* op) final {
+    Expr expr = GetRef<Expr>(op);
+    const auto& sids = GetSid(op->tuple);
+    ICHECK_LT(static_cast<size_t>(op->index), sids.size());
+    if (storage_device_map_[expr].empty()) {
+      InitStorage(expr);
+    }
+    storage_device_map_[expr][0] = {sids[0][op->index]};
+    storage_device_map_[expr][1] = {sids[1][op->index]};
+    storage_device_map_[expr][2] = {sids[2][op->index]};
+  }
+
+  void VisitExpr_(const IfNode* op) final { LOG(FATAL) << "if is not supported."; }
+
+  void VisitExpr_(const LetNode* op) final { LOG(FATAL) << "if is not supported."; }
+
+ private:
+  /*!
+   * \brief ceil(size/word_size) to get number of words.
+   * \param size The original size.
+   * \param word_size The element size.
+   */
+  static size_t DivRoundUp(size_t size, size_t word_size) {
+    return (size + word_size - 1) / word_size;
+  }
+  /*!
+   * \brief Get the memory requirement.
+   * \param prototype The prototype token.
+   * \return The required memory size.
+   */
+  size_t GetMemorySize(const TensorTypeNode* ttype) {
+    ICHECK(ttype != nullptr);
+    size_t size = 1;
+    for (IndexExpr dim : ttype->shape) {
+      const int64_t* pval = tir::as_const_int(dim);
+      ICHECK(pval != nullptr) << "Cannot allocate memory symbolic tensor shape " << ttype->shape;
+      ICHECK_GE(*pval, 0) << "Cannot allocate memory for tensor with negative shape" << *pval;
+      size *= static_cast<size_t>(pval[0]);
+    }
+    size *= DivRoundUp(ttype->dtype.bits() * ttype->dtype.lanes(), 8);
+    return size;
+  }
+  /*!
+   * \brief Get the necessary token.
+   * \param expr The expression.
+   * \return The corresponding token.
+   */
+  std::vector<std::vector<int>> GetSid(const Expr& expr) {
+    this->VisitExpr(expr);
+    auto it = storage_device_map_.find(expr);
+    ICHECK(it != storage_device_map_.end());
+    return it->second;
+  }
+
+  void CreateSid(const ExprNode* op) {
+    std::vector<int> sids;
+    std::vector<int> sizes;
+    std::vector<int> types;
+    Expr expr = GetRef<Expr>(op);
+    int device_type = node_device_map_.count(GetRef<Expr>(op)) ? node_device_map_[expr]->value : 0;
+    if (const auto* tuple_type = op->checked_type().as<TupleTypeNode>()) {
+      for (Type t : tuple_type->fields) {
+        const auto* ttype = t.as<TensorTypeNode>();
+        ICHECK(ttype);
+        sids.push_back(sid_++);
+        types.push_back(device_type);
+        sizes.push_back(GetMemorySize(ttype));
+      }
+    } else {
+      const auto* ttype = op->checked_type().as<TensorTypeNode>();
+      ICHECK(ttype);
+      sids.push_back(sid_++);
+      types.push_back(device_type);
+      sizes.push_back(GetMemorySize(ttype));
+    }
+    InitStorage(expr);
+    storage_device_map_[expr][0] = sids;
+    storage_device_map_[expr][1] = types;

Review comment:
       Where is this used in this code (types.push_back(device_type);) ? 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] giuseros commented on pull request #8096: Decoupling AOT from graph memory planner

Posted by GitBox <gi...@apache.org>.

giuseros commented on pull request #8096:
URL: https://github.com/apache/tvm/pull/8096#issuecomment-845244133


   cc: @manupa-arm @MatthewARM @mehrdadh @areusch 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] jroesch commented on pull request #8096: Decoupling AOT from graph memory planner

Posted by GitBox <gi...@apache.org>.

jroesch commented on pull request #8096:
URL: https://github.com/apache/tvm/pull/8096#issuecomment-870097981


   @giuseros I would prefer to also use an array of structs, but my goal is to just get us to use the same data structure everywhere since there are multiple places where we are storing different versions of the same data without a shared data structure. We could move the Array outside and simplify the struct but it would be good to move memory planning to also use the same structure. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] areusch commented on a change in pull request #8096: Decoupling AOT from graph memory planner

Posted by GitBox <gi...@apache.org>.

areusch commented on a change in pull request #8096:
URL: https://github.com/apache/tvm/pull/8096#discussion_r659091345



##########
File path: src/tir/transforms/legalize_packed_calls.cc
##########
@@ -60,30 +60,41 @@ class PackedCallLegalizer : public StmtExprMutator {
     if (call) {
       if (call->op.same_as(builtin::tvm_call_cpacked())) {
         Array<PrimExpr> packed_args{call->args[0]};
+        std::vector<tir::Var> tvm_values;
         for (unsigned i = 1; i < call->args.size(); i++) {
           // No need to pack inputs of the prim_func
           if (inputs_[call->args[i]] == true) {
             packed_args.push_back(call->args[i]);
           } else {
             // Pack the argument inside a TVMValue
-            auto sid_array = tir::Var("tvm_value", DataType::Handle());
-            tir::Stmt set_struct_stmt = tir::Evaluate(
+            std::stringstream ss;
+            ss << "tvm_value_" << tvm_value_index_++;
+            auto sid_array = tir::Var(ss.str(), DataType::Handle());
+            tvm_values.push_back(sid_array);
+
+            new_stmts.push_back(tir::Evaluate(
                 tvm::tir::Call(DataType::Handle(), tvm::tir::builtin::tvm_struct_set(),
-                               {sid_array, 0, tir::builtin::kArrData, call->args[i]}));
-            new_stmts.push_back(LetStmt(sid_array, StackAlloca("array", 1), set_struct_stmt));
+                               {sid_array, 0, tir::builtin::kArrData, call->args[i]})));
             packed_args.push_back(sid_array);
           }
         }
-        // Finally, evaluate the packed call and return a sequential statement
+        // Evaluate the packed call
         new_stmts.push_back(tir::Evaluate(tir::Call(call->dtype, call->op, packed_args)));
-        return tir::SeqStmt(new_stmts);
+        tir::Stmt call_stmt = tir::SeqStmt(new_stmts);
+
+        // Allocate the TVMValues on the stack and define the variables
+        for (auto v : tvm_values) {
+          call_stmt = LetStmt(v, StackAlloca("array", 1), call_stmt);

Review comment:
       just wondering if we may want to eventually create a separate memory pool for these, and whether you'd be open to (in a future PR) converting into tir.allocate optionally




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org