You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@mxnet.apache.org by GitBox <gi...@apache.org> on 2020/01/15 20:46:30 UTC

[GitHub] [incubator-mxnet] apeforest commented on issue #17292: Can't run horovod with latest nightly wheel

apeforest commented on issue #17292: Can't run horovod with latest nightly wheel
URL: https://github.com/apache/incubator-mxnet/issues/17292#issuecomment-574848779
 
 
   Thanks @stephenrawls for the analysis. 
   Here is the rootcause of the problem:
   
   1) Horovod uses MX_API_BEGIN() and MX_API_END() from mxnet/c_api_error.h to catch and throw errors in horovod APIs: https://github.com/horovod/horovod/blob/master/horovod/mxnet/mpi_ops.cc#L224
   2) MX_API_BEGIN() is a macro that calls MXAPIHandleException https://github.com/apache/incubator-mxnet/blob/master/include/mxnet/c_api_error.h#L36
   3) Before #17128, MXAPIHandleException is an inline function. And therefore when #17128 introduced a new function call NormalizeError() inside MXAPIHandleException it broke Horovod integration because the symbol of NormalizeError is not whitelist by MXNet distribution.
   4) #17298 removed NormalizeError() from MXAPIHandleException and make it not inline. https://github.com/apache/incubator-mxnet/pull/17208/files#diff-875aa4c013dbd73b044531e439e8afddR67. This time the error becomes undefined symbol of MXAPIHandleException.
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services