You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@tvm.apache.org by GitBox <gi...@apache.org> on 2022/05/18 20:52:00 UTC

[GitHub] [tvm] mbs-octoml commented on a diff in pull request #11358: WIP: [Virtual Machine] Implementation of 'set_output_zero_copy'

mbs-octoml commented on code in PR #11358:
URL: https://github.com/apache/tvm/pull/11358#discussion_r876367929


##########
src/runtime/vm/vm.cc:
##########
@@ -521,13 +603,25 @@ int64_t VirtualMachine::LoadScalarInt(Index r) const {
   return result;
 }
 
-void VirtualMachine::RunLoop() {
+Index VirtualMachine::GetResultRegisterIndex() const {
+  Index op_ind = 0;
+  Instruction instr;
+  // TODO(vvchernov): can it be endless loop?
+  do {
+    instr = code_[op_ind++];
+  } while (instr.op == Opcode::Ret);
+
+  return instr.result;
+}
+
+void VirtualMachine::RunLoop(bool set_output_enabled) {
   ICHECK(this->exec_);
   ICHECK(this->code_);
   pc_ = 0;
   Index frame_start = frames_.size();
-  while (true) {
-  main_loop:
+  Index res_reg_index = GetResultRegisterIndex();

Review Comment:
   For tuple results we'd need to redirect the allocs for the tuple indexes, which may be scattered quite widely within the code. So that suggests the compiler should record metadata for all that.
   
   But I'm wondering if it would be better to bite-the-bullet and switch the VM to DPS. After that only Invoke-without-outputs would be the special case, requiring inspection of metadata to alloc the output tensors, make the call, and optionally construct the tuple result. I know that's a much bigger change, and I guess we could make that change after this PR given the invoke APIs will be the same.
   
   Is there a high pressure customer use case which justifies that two step approach?
   



##########
src/runtime/vm/vm.cc:
##########
@@ -537,7 +631,7 @@ void VirtualMachine::RunLoop() {
         from_obj = ReadRegister(instr.from);
         WriteRegister(instr.dst, from_obj);
         pc_++;
-        goto main_loop;
+        break;

Review Comment:
   nit: This weird control flow may be the result of someone trying to improve I$ behavior so we should check. I recall there's a known issue with the VM being slightly slower than the GraphExecutor due to cache issues, but I suspect that's more likely to be due to D$ effects with the extra indirections around registers or something.



##########
src/runtime/vm/vm.cc:
##########
@@ -521,13 +603,25 @@ int64_t VirtualMachine::LoadScalarInt(Index r) const {
   return result;
 }
 
-void VirtualMachine::RunLoop() {
+Index VirtualMachine::GetResultRegisterIndex() const {
+  Index op_ind = 0;
+  Instruction instr;
+  // TODO(vvchernov): can it be endless loop?
+  do {
+    instr = code_[op_ind++];
+  } while (instr.op == Opcode::Ret);

Review Comment:
   != Opcode::Ret



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org