You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@tvm.apache.org by GitBox <gi...@apache.org> on 2020/12/15 07:07:21 UTC

[GitHub] [tvm] masahi commented on a change in pull request #7099: [CUDA] Parallel Cuda Mergesort

masahi commented on a change in pull request #7099:
URL: https://github.com/apache/tvm/pull/7099#discussion_r543096686



##########
File path: python/tvm/driver/build_module.py
##########
@@ -277,7 +277,7 @@ def _build_for_device(input_mod, target, target_host):
                 lambda f: "calling_conv" not in f.attrs
                 or f.attrs["calling_conv"].value != CallingConv.DEVICE_KERNEL_LAUNCH
             ),
-            tvm.tir.transform.Apply(lambda f: f.with_attr("target", target)),
+            tvm.tir.transform.Apply(lambda f: f.with_attr("target", target_host)),

Review comment:
       For the record, segfault with nvptx was happening because the generated host code was calling intrinsics registered for nvptx, like `__nv_log2` or `__nv_ceil`. The reason it was working on CUDA was just by coincident: there is no CUDA intrinsics registered for fp64 log2, ceil.
   
   This change fixes the issue I mentioned above.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org