You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@tvm.apache.org by GitBox <gi...@apache.org> on 2021/09/24 11:27:18 UTC

[GitHub] [tvm] mikepapadim opened a new pull request #9103: Move build module transformations and utilities to C++

mikepapadim opened a new pull request #9103:
URL: https://github.com/apache/tvm/pull/9103


   This PR builds on top of @electriclilies  initial work to move and unify build into C++ `driver_api.cc` from `build_module.py`.
   It removes redundant optimizations passes and introduces helpers functions for target mangling in the driver_api.
   
   @mbs-octoml @jroesch @electriclilies @areusch 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] areusch commented on a change in pull request #9103: [Core][Build] Move build module transformations and utilities to C++

Posted by GitBox <gi...@apache.org>.

areusch commented on a change in pull request #9103:
URL: https://github.com/apache/tvm/pull/9103#discussion_r720850402



##########
File path: python/tvm/driver/build_module.py
##########
@@ -297,28 +190,26 @@ def build(
           m1 = tvm.lower(s1, [A, B, C], name="test_add1")
           m2 = tvm.lower(s2, [A, B, C], name="test_add2")
           rt_mod = tvm.build({"llvm": m1, "cuda": m2}, target_host="llvm")
-
     Note
     ----
     See the note on :any:`tvm.target` on target string format.
     """
-    if isinstance(inputs, schedule.Schedule):
-        if args is None:
-            raise ValueError("args must be given for build from schedule")
+
+    if isinstance(inputs, (schedule.Schedule, tvm.IRModule, PrimFunc)):
+        # TODO(@mikepapadim) replace with te_lower

Review comment:
       just wondering why the `if args is None` went away?

##########
File path: src/driver/driver_api.cc
##########
@@ -155,6 +156,13 @@ transform::Pass BindTarget(Target target) {
   return tir::transform::CreatePrimFuncPass(fpass, 0, "BindTarget", {});
 }
 
+transform::Pass AnnotateEntryFunc(bool b) {

Review comment:
       should this be static as it's mainly helper, or do you envision others calling it from outside this module?

##########
File path: python/tvm/target/codegen.py
##########
@@ -36,7 +36,7 @@ def build_module(mod, target):
         The corressponding module.
     """
     target = Target(target) if isinstance(target, str) else target
-    return _ffi_api.Build(mod, target)
+    return _ffi_api.Codegen(mod, target)

Review comment:
       should we also rename the containing function, which is just a wrapper (which would really be renaming the contianing function to codegen but adding a backwards-compat `build_module` which warns about deprecation (there are some examples using warnings module i believe).

##########
File path: include/tvm/target/codegen.h
##########
@@ -45,7 +45,7 @@ using runtime::TVMRetValue;
  * \param target The target to be built.

Review comment:
       yeah it would in particular be great to identify the content of IRModule either by typedef or description in the comment.

##########
File path: src/driver/driver_api.cc
##########
@@ -373,88 +381,119 @@ TVM_REGISTER_GLOBAL("driver.lower_schedule")
       return LowerSchedule(std::move(sch), args, name, c_binds, simple_mode);
     });
 
-std::pair<IRModule, IRModule> SplitDevHostFuncs(IRModule mod_mixed, const Target& target_arg,
-                                                const Target& target_host_arg,
-                                                const transform::PassContext& pass_ctx) {
+// Splits module into one to run on the device and one to run the host. E.g., CUDA, OpenCL etc
+std::pair<IRModule, IRModule> SplitFuncsToDevHostMods(IRModule mod_mixed, const Target& target_arg,

Review comment:
       this name is probably fine but I also don't mind SplitMixedModule, since it seems we are making "mixed module" a thing.

##########
File path: src/driver/driver_api.cc
##########
@@ -373,88 +381,119 @@ TVM_REGISTER_GLOBAL("driver.lower_schedule")
       return LowerSchedule(std::move(sch), args, name, c_binds, simple_mode);
     });
 
-std::pair<IRModule, IRModule> SplitDevHostFuncs(IRModule mod_mixed, const Target& target_arg,
-                                                const Target& target_host_arg,
-                                                const transform::PassContext& pass_ctx) {
+// Splits module into one to run on the device and one to run the host. E.g., CUDA, OpenCL etc
+std::pair<IRModule, IRModule> SplitFuncsToDevHostMods(IRModule mod_mixed, const Target& target_arg,
+                                                      const Target& target_host_arg) {
   Target target = target_arg, target_host = target_host_arg;
   CheckAndUpdateHostConsistency(&target, &target_host);
-  Array<tvm::transform::Pass> mixed_pass_list = {BindTarget(target),
-                                                 tir::transform::VerifyMemory()};
 
-  mixed_pass_list.push_back(tir::transform::MergeDynamicSharedMemoryAllocations());
-  if (pass_ctx->GetConfig<Bool>("tir.detect_global_barrier", Bool(false)).value()) {
-    mixed_pass_list.push_back(tir::transform::ThreadSync("global"));
-  }
-  mixed_pass_list.push_back(tir::transform::ThreadSync("shared"));
-  mixed_pass_list.push_back(tir::transform::ThreadSync("warp"));
-  mixed_pass_list.push_back(tir::transform::InferFragment());
-  mixed_pass_list.push_back(tir::transform::LowerThreadAllreduce());
+  ICHECK(mod_mixed.defined()) << "This module must be defined";
 
-  if (target->GetAttr<Bool>("unpacked-api").value_or(Bool(false))) {
-    mixed_pass_list.push_back(tir::transform::MakeUnpackedAPI());
-  } else {
-    mixed_pass_list.push_back(tir::transform::MakePackedAPI(-1));
-  }
+  mod_mixed = OptimizeMixedModule(mod_mixed, target);
 
-  mixed_pass_list.push_back(tir::transform::SplitHostDevice());
+  auto host_mod = OptimizeHostModule(mod_mixed, target_host);
 
-  auto opt_mixed = transform::Sequential(mixed_pass_list);
-  mod_mixed = opt_mixed(std::move(mod_mixed));
-
-  auto host_pass_list = {
-      Filter([](const tir::PrimFunc& f) {
-        return f->GetAttr<Integer>(tvm::attr::kCallingConv, Integer(CallingConv::kDefault)) !=
-               CallingConv::kDeviceKernelLaunch;
-      }),
-      BindTarget(target_host),
-      tir::transform::LowerTVMBuiltin(),
-      tir::transform::LowerCustomDatatypes(),
-      tir::transform::LowerIntrin(),
-      tir::transform::LowerDeviceStorageAccessInfo(),
-      tir::transform::CombineContextCall(),
-  };
-  auto opt_host = transform::Sequential(host_pass_list);
-  ICHECK(mod_mixed.defined()) << "This module must be defined";
-  auto mhost = opt_host(mod_mixed);
-
-  // device pipeline
-  auto device_pass_list = {
-      Filter([](const tir::PrimFunc& f) {
-        return f->GetAttr<Integer>(tvm::attr::kCallingConv, Integer(CallingConv::kDefault)) ==
-               CallingConv::kDeviceKernelLaunch;
-      }),
-      BindTarget(target),
-      tir::transform::LowerWarpMemory(),
-      tir::transform::Simplify(),
-      tir::transform::LowerCustomDatatypes(),
-      tir::transform::LowerIntrin(),
-      tir::transform::LowerDeviceStorageAccessInfo(),
-  };
-  auto opt_device = transform::Sequential(device_pass_list);
-  auto mdevice = opt_device(mod_mixed);
+  auto device_mod = OptimizeDeviceModule(mod_mixed, target);
 
   // some final misc checks.
   auto keys = target->GetKeys();
   bool target_is_gpu = std::find(keys.begin(), keys.end(), "gpu") != keys.end();
-  if (target_is_gpu && mdevice->functions.size() == 0) {
+  if (target_is_gpu && device_mod->functions.size() == 0) {
     LOG(WARNING) << "Specified target " << target->str()
                  << " but cannot find device code. Did you forget to bind?";
   }
 
-  if (target->kind->device_type == kDLCPU && target_host == target) {
-    // TODO(@jroesch): This check is no longer true we need to figure out if we care about this.
-    // We need to relax this check for just TIR functions.
-    // ICHECK(mdevice->functions.empty()) << "No device code should be generated when target "
-    //                                   << "and host_target are both llvm target."
-    //                                   << "\n";
+  return {host_mod, device_mod};
+}
+
+std::pair<Target, Target> TargetTypeMangling(const Map<Target, IRModule>& inputs_arg, Target target,
+                                             Target target_host_arg) {
+  Target target_input_mod, target_host;
+
+  target = !target.defined() ? target.Current() : target;

Review comment:
       this function seems pretty deep and there are the odd semantics with target being optionally supplied from a global. additionally, it's exported to frontends (Python). if the global export is the thing that makes this target override necessary, suggest splitting that behavior out so that it's easier to reason about the internal function.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] mikepapadim commented on pull request #9103: [Core][Build] Move build module transformations and utilities to C++

Posted by GitBox <gi...@apache.org>.

mikepapadim commented on pull request #9103:
URL: https://github.com/apache/tvm/pull/9103#issuecomment-937972179


   > Thanks @mikepapadim . One thing that would be very helpful is to split the logic into two parts:
   > 
   >     * A function that configures and returns the composite Pass(aka the sequential)
   > 
   >     * A function that uses the above function to construct the composite pass and execute the optimization.
   > 
   > 
   > As part of python first principle, we want to enable ways to access and tweak optimizations in python easily. While majority of the default logic can sit in c++. It would be super helpful to have a python function that returns the composite pipeline, and possibly allow us to do further customizations (in python) if necessary and invoke them.
   
   Thanks @tqchen. So, we want to expose on the python side the ability to register additional passes for `mixed_mod`, `device_mod` and `host_mod` per pass manager when we are doing the split?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] tqchen edited a comment on pull request #9103: [Core][Build] Move build module transformations and utilities to C++

Posted by GitBox <gi...@apache.org>.

tqchen edited a comment on pull request #9103:
URL: https://github.com/apache/tvm/pull/9103#issuecomment-937750416


   Thanks @mikepapadim . One thing that would be very helpful is to split the logic into two parts:
   - A function that configures and returns the composite Pass(aka the sequential)
   - A function that uses the above function to construct the composite pass and execute the optimization.
   
   As part of python first principle, we want to enable ways to access and tweak optimizations in python easily. While majority of the default logic can sit in c++. It would be super helpful to have a python function that returns the composite pipeline, and possibly allow us to do further customizations (in python)  if necessary and invoke them.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] mikepapadim commented on pull request #9103: [Core][Build] Move build module transformations and utilities to C++

Posted by GitBox <gi...@apache.org>.

mikepapadim commented on pull request #9103:
URL: https://github.com/apache/tvm/pull/9103#issuecomment-942512206


   @mbs-octoml @jroesch @tqchen  this can be merged now
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] tqchen commented on pull request #9103: [Core][Build] Move build module transformations and utilities to C++

Posted by GitBox <gi...@apache.org>.

tqchen commented on pull request #9103:
URL: https://github.com/apache/tvm/pull/9103#issuecomment-937750416


   Thanks folks. One thing that would be very helpful is to split the logic into two parts:
   - A function that configures and returns the composite Pass(aka the sequential)
   - A function that uses the above function to construct the composite pass and execute the optimization.
   
   As part of python first principle, we want to enable ways to access and tweak optimizations in python easily. While majority of the default logic can sit in c++. It would be super helpful to have a python function that returns the composite pipeline, and possibly allow us to do further customizations (in python)  if necessary and invoke them.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] mbs-octoml commented on a change in pull request #9103: Move build module transformations and utilities to C++

Posted by GitBox <gi...@apache.org>.

mbs-octoml commented on a change in pull request #9103:
URL: https://github.com/apache/tvm/pull/9103#discussion_r715708496



##########
File path: include/tvm/driver/driver_api.h
##########
@@ -44,6 +44,31 @@
 
 namespace tvm {
 
+/*!
+ * \brief Returns the optimized IRModule for original fused module (pre split) that contains device
+ * and host code.
+ * \param mixed_mod The original mixed module.
+ * \param target The device Target.
+ * \return The result optimized mixed module.
+ */
+IRModule OptimizeMixedModule(IRModule mixed_mod, Target target);
+
+/*!
+ * \brief Returns the optimized IRModule for the device Target after device/host from mixed module.

Review comment:
       nit: after device/host splitting
   
   Worth making clearer that the result is just for target (assuming that's the case)? Ditto for the host one below.

##########
File path: apps/ios_rpc/README.md
##########
@@ -79,7 +79,7 @@ You can get value of your `team_id` in the following ways:
   select target `tvmrpc`. At the bottom of this panel go to `Signing &
   Capabilities` tab and in the field `Team` select your local developer profile
   (`Your Name (Personal Team)`).
-  
+

Review comment:
       nit: let's just revert these too to demonstrate good practice.

##########
File path: include/tvm/target/codegen.h
##########
@@ -45,7 +45,7 @@ using runtime::TVMRetValue;
  * \param target The target to be built.

Review comment:
       nit: Might as well update the comment -- we're compiling the PrimFuncs in IRModule tagged for target to a runtime::Module. 

##########
File path: python/tvm/driver/build_module.py
##########
@@ -234,18 +185,14 @@ def build(
 ):
     """Build a function with arguments as signature. Code will be generated
     for devices coupled with target information.
-

Review comment:
       nit: did these lines get deleted by black?

##########
File path: python/tvm/driver/build_module.py
##########
@@ -123,6 +122,7 @@ def lower(
     m : IRModule
        The result IRModule
     """
+    # ffi.relay.lower_te_pass()

Review comment:
       nit: intentional?

##########
File path: python/tvm/driver/build_module.py
##########
@@ -297,28 +235,27 @@ def build(
           m1 = tvm.lower(s1, [A, B, C], name="test_add1")
           m2 = tvm.lower(s2, [A, B, C], name="test_add2")
           rt_mod = tvm.build({"llvm": m1, "cuda": m2}, target_host="llvm")
-
     Note
     ----
     See the note on :any:`tvm.target` on target string format.
     """
-    if isinstance(inputs, schedule.Schedule):
-        if args is None:
-            raise ValueError("args must be given for build from schedule")
+
+    # Lowering
+    if isinstance(inputs, (schedule.Schedule, tvm.IRModule, PrimFunc)):
+        # should this be te_lower instead?
         input_mod = lower(inputs, args, name=name, binds=binds)
     elif isinstance(inputs, (list, tuple, container.Array)):
         merged_mod = tvm.IRModule({})
         for x in inputs:
             merged_mod.update(lower(x))
         input_mod = merged_mod
-    elif isinstance(inputs, (tvm.IRModule, PrimFunc)):
-        input_mod = lower(inputs)
     elif not isinstance(inputs, (dict, container.Map)):
         raise ValueError(
-            f"Inputs must be Schedule, IRModule or dict of target to IRModule, "
+            f"Inputs must be Schedule, PrimFunc, IRModule or dict of target to IRModule, "
             f"but got {type(inputs)}."
         )
 
+    # starts here

Review comment:
       nit: maybe sweep for leftover comments

##########
File path: python/tvm/driver/build_module.py
##########
@@ -149,79 +149,30 @@ def _build_for_device(input_mod, target, target_host):
 

Review comment:
       (Is there a way to tell github to allow comments on unchanged sections?)
   nit: Still using 'schedule' in the comment above.

##########
File path: src/driver/driver_api.cc
##########
@@ -373,88 +381,119 @@ TVM_REGISTER_GLOBAL("driver.lower_schedule")
       return LowerSchedule(std::move(sch), args, name, c_binds, simple_mode);
     });
 
-std::pair<IRModule, IRModule> SplitDevHostFuncs(IRModule mod_mixed, const Target& target_arg,
-                                                const Target& target_host_arg,
-                                                const transform::PassContext& pass_ctx) {
+// Splits module into one to run on the device and one to run the host. E.g., CUDA, OpenCL etc
+std::pair<IRModule, IRModule> SplitFuncsToDevHostMods(IRModule mod_mixed, const Target& target_arg,

Review comment:
       Might as well use the Doxygen-friendly comment format and give this a proper introduction.

##########
File path: src/driver/driver_api.cc
##########
@@ -530,12 +570,108 @@ runtime::Module build(const Map<String, IRModule>& inputs_arg, const Target& tar
 }
 
 // Build for homogeneous execution.
+// Where is this called from?]
+// called from compile engine and it accepts lowered functions
 runtime::Module build(const IRModule& funcs, const Target& target_arg,
                       const Target& target_host_arg) {
   auto target = target_arg, target_host = target_host_arg;
   CheckAndUpdateHostConsistency(&target, &target_host);
+  // More maps of target and target host
   Map<Target, IRModule> inputs = {{target, funcs}};
   return build(inputs, target_host);
 }
 
+IRModule OptimizeMixedModule(IRModule mixed_mod, Target target) {
+  transform::PassContext pass_ctx = transform::PassContext::Current();
+
+  Array<transform::Pass> mixed_pass_list;
+
+  mixed_pass_list.push_back(BindTarget(target));
+
+  bool is_entry_func = false;
+  if (mixed_mod->functions.size() == 1) {
+    is_entry_func = pass_ctx->GetConfig<Bool>("tir.is_entry_func", Bool(true)).value();
+    mixed_pass_list.push_back(AnnotateEntryFunc(is_entry_func));
+  }
+
+  mixed_pass_list.push_back(tir::transform::VerifyMemory());
+  mixed_pass_list.push_back(tir::transform::MergeDynamicSharedMemoryAllocations());

Review comment:
       From here on looks good as it's a mechanical transfer from the py.

##########
File path: src/driver/driver_api.cc
##########
@@ -530,12 +570,108 @@ runtime::Module build(const Map<String, IRModule>& inputs_arg, const Target& tar
 }
 
 // Build for homogeneous execution.
+// Where is this called from?]
+// called from compile engine and it accepts lowered functions
 runtime::Module build(const IRModule& funcs, const Target& target_arg,
                       const Target& target_host_arg) {
   auto target = target_arg, target_host = target_host_arg;
   CheckAndUpdateHostConsistency(&target, &target_host);
+  // More maps of target and target host
   Map<Target, IRModule> inputs = {{target, funcs}};
   return build(inputs, target_host);
 }
 
+IRModule OptimizeMixedModule(IRModule mixed_mod, Target target) {
+  transform::PassContext pass_ctx = transform::PassContext::Current();
+
+  Array<transform::Pass> mixed_pass_list;
+
+  mixed_pass_list.push_back(BindTarget(target));
+
+  bool is_entry_func = false;
+  if (mixed_mod->functions.size() == 1) {
+    is_entry_func = pass_ctx->GetConfig<Bool>("tir.is_entry_func", Bool(true)).value();

Review comment:
       I know you've transliterated this from _build_for_device but it doesn't make any sense to me: if the module has a single function with a single tir.is_entry_func=true annotation then apply the same annotation to the same function. I think it's a no-op. 

##########
File path: src/driver/driver_api.cc
##########
@@ -373,88 +381,119 @@ TVM_REGISTER_GLOBAL("driver.lower_schedule")
       return LowerSchedule(std::move(sch), args, name, c_binds, simple_mode);
     });
 
-std::pair<IRModule, IRModule> SplitDevHostFuncs(IRModule mod_mixed, const Target& target_arg,
-                                                const Target& target_host_arg,
-                                                const transform::PassContext& pass_ctx) {
+// Splits module into one to run on the device and one to run the host. E.g., CUDA, OpenCL etc
+std::pair<IRModule, IRModule> SplitFuncsToDevHostMods(IRModule mod_mixed, const Target& target_arg,
+                                                      const Target& target_host_arg) {
   Target target = target_arg, target_host = target_host_arg;
   CheckAndUpdateHostConsistency(&target, &target_host);
-  Array<tvm::transform::Pass> mixed_pass_list = {BindTarget(target),
-                                                 tir::transform::VerifyMemory()};
 
-  mixed_pass_list.push_back(tir::transform::MergeDynamicSharedMemoryAllocations());
-  if (pass_ctx->GetConfig<Bool>("tir.detect_global_barrier", Bool(false)).value()) {
-    mixed_pass_list.push_back(tir::transform::ThreadSync("global"));
-  }
-  mixed_pass_list.push_back(tir::transform::ThreadSync("shared"));
-  mixed_pass_list.push_back(tir::transform::ThreadSync("warp"));
-  mixed_pass_list.push_back(tir::transform::InferFragment());
-  mixed_pass_list.push_back(tir::transform::LowerThreadAllreduce());
+  ICHECK(mod_mixed.defined()) << "This module must be defined";
 
-  if (target->GetAttr<Bool>("unpacked-api").value_or(Bool(false))) {
-    mixed_pass_list.push_back(tir::transform::MakeUnpackedAPI());
-  } else {
-    mixed_pass_list.push_back(tir::transform::MakePackedAPI(-1));
-  }
+  mod_mixed = OptimizeMixedModule(mod_mixed, target);
 
-  mixed_pass_list.push_back(tir::transform::SplitHostDevice());
+  auto host_mod = OptimizeHostModule(mod_mixed, target_host);
 
-  auto opt_mixed = transform::Sequential(mixed_pass_list);
-  mod_mixed = opt_mixed(std::move(mod_mixed));
-
-  auto host_pass_list = {
-      Filter([](const tir::PrimFunc& f) {
-        return f->GetAttr<Integer>(tvm::attr::kCallingConv, Integer(CallingConv::kDefault)) !=
-               CallingConv::kDeviceKernelLaunch;
-      }),
-      BindTarget(target_host),
-      tir::transform::LowerTVMBuiltin(),
-      tir::transform::LowerCustomDatatypes(),
-      tir::transform::LowerIntrin(),
-      tir::transform::LowerDeviceStorageAccessInfo(),
-      tir::transform::CombineContextCall(),
-  };
-  auto opt_host = transform::Sequential(host_pass_list);
-  ICHECK(mod_mixed.defined()) << "This module must be defined";
-  auto mhost = opt_host(mod_mixed);
-
-  // device pipeline
-  auto device_pass_list = {
-      Filter([](const tir::PrimFunc& f) {
-        return f->GetAttr<Integer>(tvm::attr::kCallingConv, Integer(CallingConv::kDefault)) ==
-               CallingConv::kDeviceKernelLaunch;
-      }),
-      BindTarget(target),
-      tir::transform::LowerWarpMemory(),
-      tir::transform::Simplify(),
-      tir::transform::LowerCustomDatatypes(),
-      tir::transform::LowerIntrin(),
-      tir::transform::LowerDeviceStorageAccessInfo(),
-  };
-  auto opt_device = transform::Sequential(device_pass_list);
-  auto mdevice = opt_device(mod_mixed);
+  auto device_mod = OptimizeDeviceModule(mod_mixed, target);
 
   // some final misc checks.
   auto keys = target->GetKeys();
   bool target_is_gpu = std::find(keys.begin(), keys.end(), "gpu") != keys.end();
-  if (target_is_gpu && mdevice->functions.size() == 0) {
+  if (target_is_gpu && device_mod->functions.size() == 0) {
     LOG(WARNING) << "Specified target " << target->str()
                  << " but cannot find device code. Did you forget to bind?";
   }
 
-  if (target->kind->device_type == kDLCPU && target_host == target) {
-    // TODO(@jroesch): This check is no longer true we need to figure out if we care about this.
-    // We need to relax this check for just TIR functions.
-    // ICHECK(mdevice->functions.empty()) << "No device code should be generated when target "
-    //                                   << "and host_target are both llvm target."
-    //                                   << "\n";
+  return {host_mod, device_mod};
+}
+
+std::pair<Target, Target> TargetTypeMangling(const Map<Target, IRModule>& inputs_arg, Target target,

Review comment:
       Full comments please.

##########
File path: src/driver/driver_api.cc
##########
@@ -373,88 +381,119 @@ TVM_REGISTER_GLOBAL("driver.lower_schedule")
       return LowerSchedule(std::move(sch), args, name, c_binds, simple_mode);
     });
 
-std::pair<IRModule, IRModule> SplitDevHostFuncs(IRModule mod_mixed, const Target& target_arg,
-                                                const Target& target_host_arg,
-                                                const transform::PassContext& pass_ctx) {
+// Splits module into one to run on the device and one to run the host. E.g., CUDA, OpenCL etc
+std::pair<IRModule, IRModule> SplitFuncsToDevHostMods(IRModule mod_mixed, const Target& target_arg,
+                                                      const Target& target_host_arg) {
   Target target = target_arg, target_host = target_host_arg;
   CheckAndUpdateHostConsistency(&target, &target_host);
-  Array<tvm::transform::Pass> mixed_pass_list = {BindTarget(target),
-                                                 tir::transform::VerifyMemory()};
 
-  mixed_pass_list.push_back(tir::transform::MergeDynamicSharedMemoryAllocations());
-  if (pass_ctx->GetConfig<Bool>("tir.detect_global_barrier", Bool(false)).value()) {
-    mixed_pass_list.push_back(tir::transform::ThreadSync("global"));
-  }
-  mixed_pass_list.push_back(tir::transform::ThreadSync("shared"));
-  mixed_pass_list.push_back(tir::transform::ThreadSync("warp"));
-  mixed_pass_list.push_back(tir::transform::InferFragment());
-  mixed_pass_list.push_back(tir::transform::LowerThreadAllreduce());
+  ICHECK(mod_mixed.defined()) << "This module must be defined";
 
-  if (target->GetAttr<Bool>("unpacked-api").value_or(Bool(false))) {
-    mixed_pass_list.push_back(tir::transform::MakeUnpackedAPI());
-  } else {
-    mixed_pass_list.push_back(tir::transform::MakePackedAPI(-1));
-  }
+  mod_mixed = OptimizeMixedModule(mod_mixed, target);
 
-  mixed_pass_list.push_back(tir::transform::SplitHostDevice());
+  auto host_mod = OptimizeHostModule(mod_mixed, target_host);
 
-  auto opt_mixed = transform::Sequential(mixed_pass_list);
-  mod_mixed = opt_mixed(std::move(mod_mixed));
-
-  auto host_pass_list = {
-      Filter([](const tir::PrimFunc& f) {
-        return f->GetAttr<Integer>(tvm::attr::kCallingConv, Integer(CallingConv::kDefault)) !=
-               CallingConv::kDeviceKernelLaunch;
-      }),
-      BindTarget(target_host),
-      tir::transform::LowerTVMBuiltin(),
-      tir::transform::LowerCustomDatatypes(),
-      tir::transform::LowerIntrin(),
-      tir::transform::LowerDeviceStorageAccessInfo(),
-      tir::transform::CombineContextCall(),
-  };
-  auto opt_host = transform::Sequential(host_pass_list);
-  ICHECK(mod_mixed.defined()) << "This module must be defined";
-  auto mhost = opt_host(mod_mixed);
-
-  // device pipeline
-  auto device_pass_list = {
-      Filter([](const tir::PrimFunc& f) {
-        return f->GetAttr<Integer>(tvm::attr::kCallingConv, Integer(CallingConv::kDefault)) ==
-               CallingConv::kDeviceKernelLaunch;
-      }),
-      BindTarget(target),
-      tir::transform::LowerWarpMemory(),
-      tir::transform::Simplify(),
-      tir::transform::LowerCustomDatatypes(),
-      tir::transform::LowerIntrin(),
-      tir::transform::LowerDeviceStorageAccessInfo(),
-  };
-  auto opt_device = transform::Sequential(device_pass_list);
-  auto mdevice = opt_device(mod_mixed);
+  auto device_mod = OptimizeDeviceModule(mod_mixed, target);
 
   // some final misc checks.
   auto keys = target->GetKeys();
   bool target_is_gpu = std::find(keys.begin(), keys.end(), "gpu") != keys.end();
-  if (target_is_gpu && mdevice->functions.size() == 0) {
+  if (target_is_gpu && device_mod->functions.size() == 0) {
     LOG(WARNING) << "Specified target " << target->str()
                  << " but cannot find device code. Did you forget to bind?";
   }
 
-  if (target->kind->device_type == kDLCPU && target_host == target) {
-    // TODO(@jroesch): This check is no longer true we need to figure out if we care about this.
-    // We need to relax this check for just TIR functions.
-    // ICHECK(mdevice->functions.empty()) << "No device code should be generated when target "
-    //                                   << "and host_target are both llvm target."
-    //                                   << "\n";
+  return {host_mod, device_mod};
+}
+
+std::pair<Target, Target> TargetTypeMangling(const Map<Target, IRModule>& inputs_arg, Target target,
+                                             Target target_host_arg) {
+  Target target_input_mod, target_host;
+
+  target = !target.defined() ? target.Current() : target;

Review comment:
       I've gotten lost at this point: I had no idea we had a global target context stack and don't understand how that interacts with all the target arguments we're using here.
   
   Since you've gone pretty deep into this now could you write some comments explaining the mangling and finalization steps then I can try again.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] jroesch merged pull request #9103: [Core][Build] Move build module transformations and utilities to C++

Posted by GitBox <gi...@apache.org>.

jroesch merged pull request #9103:
URL: https://github.com/apache/tvm/pull/9103


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] mikepapadim commented on pull request #9103: [Core][Build] Move build module transformations and utilities to C++

Posted by GitBox <gi...@apache.org>.

mikepapadim commented on pull request #9103:
URL: https://github.com/apache/tvm/pull/9103#issuecomment-937972179


   > Thanks @mikepapadim . One thing that would be very helpful is to split the logic into two parts:
   > 
   >     * A function that configures and returns the composite Pass(aka the sequential)
   > 
   >     * A function that uses the above function to construct the composite pass and execute the optimization.
   > 
   > 
   > As part of python first principle, we want to enable ways to access and tweak optimizations in python easily. While majority of the default logic can sit in c++. It would be super helpful to have a python function that returns the composite pipeline, and possibly allow us to do further customizations (in python) if necessary and invoke them.
   
   Thanks @tqchen. So, we want to expose on the python side the ability to register additional passes for `mixed_mod`, `device_mod` and `host_mod` per pass manager when we are doing the split?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] jroesch commented on a change in pull request #9103: [Core][Build] Move build module transformations and utilities to C++

Posted by GitBox <gi...@apache.org>.

jroesch commented on a change in pull request #9103:
URL: https://github.com/apache/tvm/pull/9103#discussion_r728738291



##########
File path: src/driver/driver_api.cc
##########
@@ -373,97 +394,96 @@ TVM_REGISTER_GLOBAL("driver.lower_schedule")
       return LowerSchedule(std::move(sch), args, name, c_binds, simple_mode);
     });
 
-std::pair<IRModule, IRModule> SplitDevHostFuncs(IRModule mod_mixed, const Target& target_arg,
-                                                const Target& target_host_arg,
-                                                const transform::PassContext& pass_ctx) {
+/**
+ * This function takes the input module that contains both the device and host opts.
+ * Then, it applies transformation on the original module before splitting into separate modules for
+ * device and host. Then it also applies transformations on the new splitted modules.
+ */
+std::pair<IRModule, IRModule> SplitMixedModule(IRModule mod_mixed, const Target& target_arg,
+                                               const Target& target_host_arg) {
   Target target = target_arg, target_host = target_host_arg;
   CheckAndUpdateHostConsistency(&target, &target_host);
-  Array<tvm::transform::Pass> mixed_pass_list = {BindTarget(target),
-                                                 tir::transform::VerifyMemory()};
 
-  mixed_pass_list.push_back(tir::transform::MergeDynamicSharedMemoryAllocations());
-  if (pass_ctx->GetConfig<Bool>("tir.detect_global_barrier", Bool(false)).value()) {
-    mixed_pass_list.push_back(tir::transform::ThreadSync("global"));
-  }
-  mixed_pass_list.push_back(tir::transform::ThreadSync("shared"));
-  mixed_pass_list.push_back(tir::transform::ThreadSync("warp"));
-  mixed_pass_list.push_back(tir::transform::InferFragment());
-  mixed_pass_list.push_back(tir::transform::LowerThreadAllreduce());
+  ICHECK(mod_mixed.defined()) << "This module must be defined";
 
-  if (target->GetAttr<Bool>("unpacked-api").value_or(Bool(false))) {
-    mixed_pass_list.push_back(tir::transform::MakeUnpackedAPI());
-  } else {
-    mixed_pass_list.push_back(tir::transform::MakePackedAPI(-1));
-  }
+  mod_mixed = ApplyPasses(mod_mixed, MixedModulePassManager(mod_mixed, target));
 
-  mixed_pass_list.push_back(tir::transform::SplitHostDevice());
+  IRModule host_mod = ApplyPasses(mod_mixed, HostModulePassManager(mod_mixed, target_host));
 
-  auto opt_mixed = transform::Sequential(mixed_pass_list);
-  mod_mixed = opt_mixed(std::move(mod_mixed));
-
-  // We make an assumption here that the overriden host target
-  // can be used alongside the default host codegen based on device type
-  // this is so the correct code generator is used later instead of overriding the target.
-  // We need better support for inserting multiple kDLCPU targets as our current options
-  // are kDeviceKernelLaunch or not
-  Target overriden_host_target = target_host;
-  if (target->kind->device_type == target_host->kind->device_type) {
-    overriden_host_target = target;
-  }
-  auto host_pass_list = {
-      Filter([](const tir::PrimFunc& f) {
-        return f->GetAttr<Integer>(tvm::attr::kCallingConv, Integer(CallingConv::kDefault)) !=
-               CallingConv::kDeviceKernelLaunch;
-      }),
-      BindTarget(overriden_host_target),
-      tir::transform::LowerTVMBuiltin(),
-      tir::transform::LowerCustomDatatypes(),
-      tir::transform::LowerIntrin(),
-      tir::transform::LowerDeviceStorageAccessInfo(),
-      tir::transform::CombineContextCall(),
-  };
-  auto opt_host = transform::Sequential(host_pass_list);
-  ICHECK(mod_mixed.defined()) << "This module must be defined";
-  auto mhost = opt_host(mod_mixed);
-
-  // device pipeline
-  auto device_pass_list = {
-      Filter([](const tir::PrimFunc& f) {
-        return f->GetAttr<Integer>(tvm::attr::kCallingConv, Integer(CallingConv::kDefault)) ==
-               CallingConv::kDeviceKernelLaunch;
-      }),
-      BindTarget(target),
-      tir::transform::LowerWarpMemory(),
-      tir::transform::Simplify(),
-      tir::transform::LowerCustomDatatypes(),
-      tir::transform::LowerIntrin(),
-      tir::transform::LowerDeviceStorageAccessInfo(),
-  };
-  auto opt_device = transform::Sequential(device_pass_list);
-  auto mdevice = opt_device(mod_mixed);
+  IRModule device_mod = ApplyPasses(mod_mixed, DeviceModulePassManager(mod_mixed, target));
 
-  // some final misc checks.
   auto keys = target->GetKeys();
+
+  CheckAndUpdateHostConsistency(&target, &target_host);
+
   bool target_is_gpu = std::find(keys.begin(), keys.end(), "gpu") != keys.end();
-  if (target_is_gpu && mdevice->functions.size() == 0) {
-    LOG(WARNING) << "Specified target " << target->str()
-                 << " but cannot find device code. Did you forget to bind?";
+  if (target_is_gpu && device_mod->functions.size() == 0) {
+    DLOG(WARNING) << "Specified target " << target->str()
+                  << " but cannot find device code. Did you forget to bind?";
+  }
+
+  return {host_mod, device_mod};
+}
+
+runtime::Module FinalizeModule(const Map<Target, IRModule>& inputs_arg, const Target& host_target) {

Review comment:
       I think we should probably come up with a better name for this function, finalize has too many technical meanings, probably better to call something which implies what is going on cc @mbs-octoml going to land as is, maybe we can make a note to follow up on this.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] Mousius commented on a change in pull request #9103: Move build module transformations and utilities to C++

Posted by GitBox <gi...@apache.org>.

Mousius commented on a change in pull request #9103:
URL: https://github.com/apache/tvm/pull/9103#discussion_r715590473



##########
File path: python/tvm/relay/build_module.py
##########
@@ -282,6 +282,8 @@ def get_executor_from_target(target, target_host):
     return executor
 
 
+# Which build is this one... Relay --> graph executor
+# can params being parsed during run-time?

Review comment:
       I don't think these comments should be here?

##########
File path: python/tvm/driver/build_module.py
##########
@@ -123,6 +123,7 @@ def lower(
     m : IRModule
        The result IRModule
     """
+    # ffi.relay.lower_te_pass()

Review comment:
       I don't think this comment is necessary?

##########
File path: python/tvm/target/target.py
##########
@@ -174,6 +174,7 @@ def list_kinds():
         """Returns the list of available target names."""
         return list(_ffi_api.ListTargetKinds())
 
+    # TODO: make this return IRModule? idk it seems

Review comment:
       This thought is unfinished

##########
File path: python/tvm/driver/build_module.py
##########
@@ -297,25 +238,23 @@ def build(
           m1 = tvm.lower(s1, [A, B, C], name="test_add1")
           m2 = tvm.lower(s2, [A, B, C], name="test_add2")
           rt_mod = tvm.build({"llvm": m1, "cuda": m2}, target_host="llvm")
-
     Note
     ----
     See the note on :any:`tvm.target` on target string format.
     """
-    if isinstance(inputs, schedule.Schedule):
-        if args is None:
-            raise ValueError("args must be given for build from schedule")
+
+    # Lowering
+    if isinstance(inputs, (schedule.Schedule, tvm.IRModule, PrimFunc)):
+        # should this be te_lower instead?

Review comment:
       Also these comments.

##########
File path: python/tvm/target/codegen.py
##########
@@ -35,8 +35,11 @@ def build_module(mod, target):
     module : runtime.Module
         The corressponding module.
     """
+    print("In codegen build module")

Review comment:
       Remove this `print()`

##########
File path: python/tvm/driver/build_module.py
##########
@@ -149,79 +150,32 @@ def _build_for_device(input_mod, target, target_host):
 
     Returns
     -------
-    fhost : IRModule
+    host_mod : IRModule
         The host IRModule.
 
-    mdev : tvm.module
+    device_mod : tvm.module
         A module that contains device code.
     """
-    target, target_host = Target.check_and_update_host_consist(target, target_host)
-    device_type = ndarray.device(target.kind.name, 0).device_type
+    from tvm.driver import _ffi_api as _driver_ffi
 
-    mod_mixed = input_mod
-    mod_mixed = tvm.tir.transform.Apply(lambda f: f.with_attr("target", target))(mod_mixed)
-
-    opt_mixed = [
-        tvm.tir.transform.VerifyMemory(),
-        tvm.tir.transform.MergeDynamicSharedMemoryAllocations(),
-    ]
-    if len(mod_mixed.functions) == 1:
-        opt_mixed += [tvm.tir.transform.Apply(lambda f: f.with_attr("tir.is_entry_func", True))]
-
-    if PassContext.current().config.get("tir.detect_global_barrier", False):
-        opt_mixed += [tvm.tir.transform.ThreadSync("global")]
-    opt_mixed += [
-        tvm.tir.transform.ThreadSync("shared"),
-        tvm.tir.transform.ThreadSync("warp"),
-        tvm.tir.transform.InferFragment(),
-        tvm.tir.transform.LowerThreadAllreduce(),
-        tvm.tir.transform.MakePackedAPI(),
-        tvm.tir.transform.SplitHostDevice(),
-    ]
-    mod_mixed = tvm.transform.Sequential(opt_mixed)(mod_mixed)
-
-    # device optimizations
-    opt_device = tvm.transform.Sequential(
-        [
-            tvm.tir.transform.Filter(
-                lambda f: "calling_conv" in f.attrs
-                and f.attrs["calling_conv"].value == CallingConv.DEVICE_KERNEL_LAUNCH
-            ),
-            tvm.tir.transform.LowerWarpMemory(),
-            tvm.tir.transform.Simplify(),
-            tvm.tir.transform.LowerDeviceStorageAccessInfo(),
-            tvm.tir.transform.LowerCustomDatatypes(),
-            tvm.tir.transform.LowerIntrin(),
-        ]
-    )
-    mod_dev = opt_device(mod_mixed)
-
-    # host optimizations
-    opt_host = tvm.transform.Sequential(
-        [
-            tvm.tir.transform.Filter(
-                lambda f: "calling_conv" not in f.attrs
-                or f.attrs["calling_conv"].value != CallingConv.DEVICE_KERNEL_LAUNCH
-            ),
-            tvm.tir.transform.Apply(lambda f: f.with_attr("target", target_host)),
-            tvm.tir.transform.LowerTVMBuiltin(),
-            tvm.tir.transform.LowerDeviceStorageAccessInfo(),
-            tvm.tir.transform.LowerCustomDatatypes(),
-            tvm.tir.transform.LowerIntrin(),
-            tvm.tir.transform.CombineContextCall(),
-        ]
-    )
-    mod_host = opt_host(mod_mixed)
+    mod_mixed = _driver_ffi.get_mod_mixed(input_mod, target)
+    device_mod = _driver_ffi.get_device_mod(mod_mixed, target)
+    host_mod = _driver_ffi.get_host_mod(mod_mixed, target_host)
 
+    device_type = ndarray.device(target.kind.name, 0).device_type
     if device_type == ndarray.cpu(0).device_type and target_host == target:
-        assert len(mod_dev.functions) == 0
-    if "gpu" in target.keys and len(mod_dev.functions) == 0:
+        assert len(device_mod.functions) == 0
+    if "gpu" in target.keys and len(device_mod.functions) == 0:
         warnings.warn(
             "Specified target %s, but cannot find device code, did you do " "bind?" % target
         )
 
-    rt_mod_dev = codegen.build_module(mod_dev, target) if len(mod_dev.functions) != 0 else None
-    return mod_host, rt_mod_dev
+    # rt_mod_dev is runtime::Module so this can be moved out maybe?
+    rt_mod_dev = (
+        codegen.build_module(device_mod, target) if len(device_mod.functions) != 0 else None
+    )
+    # TIR module for the host, runtime module for devices?

Review comment:
       Do we need to resolve these questions?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] mikepapadim commented on pull request #9103: [Core][Build] Move build module transformations and utilities to C++

Posted by GitBox <gi...@apache.org>.

mikepapadim commented on pull request #9103:
URL: https://github.com/apache/tvm/pull/9103#issuecomment-942512206


   @mbs-octoml @jroesch @tqchen  this can be merged now
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] tqchen edited a comment on pull request #9103: [Core][Build] Move build module transformations and utilities to C++

Posted by GitBox <gi...@apache.org>.

tqchen edited a comment on pull request #9103:
URL: https://github.com/apache/tvm/pull/9103#issuecomment-937750416


   Thanks @mikepapadim . One thing that would be very helpful is to split the logic into two parts:
   - A function that configures and returns the composite Pass(aka the sequential)
   - A function that uses the above function to construct the composite pass and execute the optimization.
   
   As part of python first principle, we want to enable ways to access and tweak optimizations in python easily. While majority of the default logic can sit in c++. It would be super helpful to have a python function that returns the composite pipeline, and possibly allow us to do further customizations (in python)  if necessary and invoke them.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] mikepapadim commented on pull request #9103: [Core][Build] Move build module transformations and utilities to C++

Posted by GitBox <gi...@apache.org>.

mikepapadim commented on pull request #9103:
URL: https://github.com/apache/tvm/pull/9103#issuecomment-938630828


   PTAL 
   If it is in a good state we can have a follow-up PR with just for adding the flexibility on adding passes on the separate module managers from the python side.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] mikepapadim commented on a change in pull request #9103: [Core][Build] Move build module transformations and utilities to C++

Posted by GitBox <gi...@apache.org>.

mikepapadim commented on a change in pull request #9103:
URL: https://github.com/apache/tvm/pull/9103#discussion_r728774603



##########
File path: src/driver/driver_api.cc
##########
@@ -373,97 +394,96 @@ TVM_REGISTER_GLOBAL("driver.lower_schedule")
       return LowerSchedule(std::move(sch), args, name, c_binds, simple_mode);
     });
 
-std::pair<IRModule, IRModule> SplitDevHostFuncs(IRModule mod_mixed, const Target& target_arg,
-                                                const Target& target_host_arg,
-                                                const transform::PassContext& pass_ctx) {
+/**
+ * This function takes the input module that contains both the device and host opts.
+ * Then, it applies transformation on the original module before splitting into separate modules for
+ * device and host. Then it also applies transformations on the new splitted modules.
+ */
+std::pair<IRModule, IRModule> SplitMixedModule(IRModule mod_mixed, const Target& target_arg,
+                                               const Target& target_host_arg) {
   Target target = target_arg, target_host = target_host_arg;
   CheckAndUpdateHostConsistency(&target, &target_host);
-  Array<tvm::transform::Pass> mixed_pass_list = {BindTarget(target),
-                                                 tir::transform::VerifyMemory()};
 
-  mixed_pass_list.push_back(tir::transform::MergeDynamicSharedMemoryAllocations());
-  if (pass_ctx->GetConfig<Bool>("tir.detect_global_barrier", Bool(false)).value()) {
-    mixed_pass_list.push_back(tir::transform::ThreadSync("global"));
-  }
-  mixed_pass_list.push_back(tir::transform::ThreadSync("shared"));
-  mixed_pass_list.push_back(tir::transform::ThreadSync("warp"));
-  mixed_pass_list.push_back(tir::transform::InferFragment());
-  mixed_pass_list.push_back(tir::transform::LowerThreadAllreduce());
+  ICHECK(mod_mixed.defined()) << "This module must be defined";
 
-  if (target->GetAttr<Bool>("unpacked-api").value_or(Bool(false))) {
-    mixed_pass_list.push_back(tir::transform::MakeUnpackedAPI());
-  } else {
-    mixed_pass_list.push_back(tir::transform::MakePackedAPI(-1));
-  }
+  mod_mixed = ApplyPasses(mod_mixed, MixedModulePassManager(mod_mixed, target));
 
-  mixed_pass_list.push_back(tir::transform::SplitHostDevice());
+  IRModule host_mod = ApplyPasses(mod_mixed, HostModulePassManager(mod_mixed, target_host));
 
-  auto opt_mixed = transform::Sequential(mixed_pass_list);
-  mod_mixed = opt_mixed(std::move(mod_mixed));
-
-  // We make an assumption here that the overriden host target
-  // can be used alongside the default host codegen based on device type
-  // this is so the correct code generator is used later instead of overriding the target.
-  // We need better support for inserting multiple kDLCPU targets as our current options
-  // are kDeviceKernelLaunch or not
-  Target overriden_host_target = target_host;
-  if (target->kind->device_type == target_host->kind->device_type) {
-    overriden_host_target = target;
-  }
-  auto host_pass_list = {
-      Filter([](const tir::PrimFunc& f) {
-        return f->GetAttr<Integer>(tvm::attr::kCallingConv, Integer(CallingConv::kDefault)) !=
-               CallingConv::kDeviceKernelLaunch;
-      }),
-      BindTarget(overriden_host_target),
-      tir::transform::LowerTVMBuiltin(),
-      tir::transform::LowerCustomDatatypes(),
-      tir::transform::LowerIntrin(),
-      tir::transform::LowerDeviceStorageAccessInfo(),
-      tir::transform::CombineContextCall(),
-  };
-  auto opt_host = transform::Sequential(host_pass_list);
-  ICHECK(mod_mixed.defined()) << "This module must be defined";
-  auto mhost = opt_host(mod_mixed);
-
-  // device pipeline
-  auto device_pass_list = {
-      Filter([](const tir::PrimFunc& f) {
-        return f->GetAttr<Integer>(tvm::attr::kCallingConv, Integer(CallingConv::kDefault)) ==
-               CallingConv::kDeviceKernelLaunch;
-      }),
-      BindTarget(target),
-      tir::transform::LowerWarpMemory(),
-      tir::transform::Simplify(),
-      tir::transform::LowerCustomDatatypes(),
-      tir::transform::LowerIntrin(),
-      tir::transform::LowerDeviceStorageAccessInfo(),
-  };
-  auto opt_device = transform::Sequential(device_pass_list);
-  auto mdevice = opt_device(mod_mixed);
+  IRModule device_mod = ApplyPasses(mod_mixed, DeviceModulePassManager(mod_mixed, target));
 
-  // some final misc checks.
   auto keys = target->GetKeys();
+
+  CheckAndUpdateHostConsistency(&target, &target_host);
+
   bool target_is_gpu = std::find(keys.begin(), keys.end(), "gpu") != keys.end();
-  if (target_is_gpu && mdevice->functions.size() == 0) {
-    LOG(WARNING) << "Specified target " << target->str()
-                 << " but cannot find device code. Did you forget to bind?";
+  if (target_is_gpu && device_mod->functions.size() == 0) {
+    DLOG(WARNING) << "Specified target " << target->str()
+                  << " but cannot find device code. Did you forget to bind?";
+  }
+
+  return {host_mod, device_mod};
+}
+
+runtime::Module FinalizeModule(const Map<Target, IRModule>& inputs_arg, const Target& host_target) {

Review comment:
       Sure, IMO there is some more cleanup to be done in `build_module`.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] tqchen commented on pull request #9103: [Core][Build] Move build module transformations and utilities to C++

Posted by GitBox <gi...@apache.org>.

tqchen commented on pull request #9103:
URL: https://github.com/apache/tvm/pull/9103#issuecomment-937750416


   Thanks folks. One thing that would be very helpful is to split the logic into two parts:
   - A function that configures and returns the composite Pass(aka the sequential)
   - A function that uses the above function to construct the composite pass and execute the optimization.
   
   As part of python first principle, we want to enable ways to access and tweak optimizations in python easily. While majority of the default logic can sit in c++. It would be super helpful to have a python function that returns the composite pipeline, and possibly allow us to do further customizations (in python)  if necessary and invoke them.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org