You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@mxnet.apache.org by GitBox <gi...@apache.org> on 2020/07/15 15:50:21 UTC

[GitHub] [incubator-mxnet] leezu commented on issue #18716: [RFC] Use TVMOp with GPU & Build without libcuda.so in CI

leezu commented on issue #18716:
URL: https://github.com/apache/incubator-mxnet/issues/18716#issuecomment-658846227

> Violates the effort of removing libcuda.so totally, (would be great if someone can elaborate the motivation behind it).

Many customers use a single mxnet build that supports gpu features and deploy it to both gpu and cpu machines. Due to the way how cuda containers are designed, libcuda.so won't be present on the cpu machines. That's why it's better to dlopen(cuda) only once needed. This not only affects tvmop but als nvrtc feature in mxnet.

Using the stubs is a workaround for using dlopen, but adds additional requirements for modifying the LD_LIBRARY_PATH on users cpu machines. That's not always feasible for users and for mxnet 1.6, which introduced nvrtc, users typically just disable the nvrtc feature to be able to deploy the libmxnet.so to both cpu and gpu machines.

Why not fix the underlying problem and then enable tvmop feature?

> Also, When setting -DUSE_TVM_OP=OFF the CI checks would be stuck.

That doesn't make sense as we are running CI successfully with tvm op disabled since a couple of months? Maybe you ran into some unrelated flakyness and need to retrigger the run?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org