You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@mxnet.apache.org by GitBox <gi...@apache.org> on 2020/08/11 16:19:22 UTC

[GitHub] [incubator-mxnet] samskalicky opened a new pull request #18904: [WIP] MXNet external operators

samskalicky opened a new pull request #18904:
URL: https://github.com/apache/incubator-mxnet/pull/18904


   ## Description ##
   HUGE WIP!!!!!!!!! I will add RFC tag after a little more work to flush this idea out and start a discussion.
   
   I would like to propose another feature in MXNet Extensions for "_external ops_". Currently we have the following categories of operators in MXNet:
   - **builtin/backend operators** are those defined in the src/operator directory and are compiled into libmxnet.so
   - **Python custom operators** are custom operators written in Python by users (not part of MXNet source code)
   - **C++ custom operators** are custom operators written in C++ by users (not part of MXNet source code) that use the MXNet Extensions (lib_api.h)
   
   _External operators_ will be builtin/backend operators but not compiled into libmxnet.so. Instead, they will be compiled into a separate shared object and dynamically loaded at runtime similar to C++ custom operators. 
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-mxnet] samskalicky commented on a change in pull request #18904: [RFC] MXNet external operators

Posted by GitBox <gi...@apache.org>.
samskalicky commented on a change in pull request #18904:
URL: https://github.com/apache/incubator-mxnet/pull/18904#discussion_r472667585



##########
File path: CMakeLists.txt
##########
@@ -708,18 +708,18 @@ endif()
 target_compile_definitions(mxnet PUBLIC DMLC_LOG_FATAL_THROW=$<BOOL:${LOG_FATAL_THROW}>)
 
 # extension libraries (custom operators, custom subgraphs) are built by default
-add_library(customop_lib SHARED ${CMAKE_CURRENT_SOURCE_DIR}/example/extensions/lib_custom_op/gemm_lib.cc)
-add_library(transposecsr_lib SHARED ${CMAKE_CURRENT_SOURCE_DIR}/example/extensions/lib_custom_op/transposecsr_lib.cc)
-add_library(transposerowsp_lib SHARED ${CMAKE_CURRENT_SOURCE_DIR}/example/extensions/lib_custom_op/transposerowsp_lib.cc)
-add_library(subgraph_lib SHARED ${CMAKE_CURRENT_SOURCE_DIR}/example/extensions/lib_subgraph/subgraph_lib.cc)
-add_library(pass_lib SHARED ${CMAKE_CURRENT_SOURCE_DIR}/example/extensions/lib_pass/pass_lib.cc)
+add_library(customop_lib SHARED ${CMAKE_CURRENT_SOURCE_DIR}/example/extensions/lib_custom_op/gemm_lib.cc ${CMAKE_CURRENT_SOURCE_DIR}/src/lib_api.cc)
+add_library(transposecsr_lib SHARED ${CMAKE_CURRENT_SOURCE_DIR}/example/extensions/lib_custom_op/transposecsr_lib.cc ${CMAKE_CURRENT_SOURCE_DIR}/src/lib_api.cc)
+add_library(transposerowsp_lib SHARED ${CMAKE_CURRENT_SOURCE_DIR}/example/extensions/lib_custom_op/transposerowsp_lib.cc ${CMAKE_CURRENT_SOURCE_DIR}/src/lib_api.cc)
+add_library(subgraph_lib SHARED ${CMAKE_CURRENT_SOURCE_DIR}/example/extensions/lib_subgraph/subgraph_lib.cc ${CMAKE_CURRENT_SOURCE_DIR}/src/lib_api.cc)
+add_library(pass_lib SHARED ${CMAKE_CURRENT_SOURCE_DIR}/example/extensions/lib_pass/pass_lib.cc ${CMAKE_CURRENT_SOURCE_DIR}/src/lib_api.cc)
 target_include_directories(customop_lib PUBLIC ${CMAKE_CURRENT_SOURCE_DIR}/include/mxnet)
 target_include_directories(transposecsr_lib PUBLIC ${CMAKE_CURRENT_SOURCE_DIR}/include/mxnet)
 target_include_directories(transposerowsp_lib PUBLIC ${CMAKE_CURRENT_SOURCE_DIR}/include/mxnet)
 target_include_directories(subgraph_lib PUBLIC ${CMAKE_CURRENT_SOURCE_DIR}/include/mxnet)
 target_include_directories(pass_lib PUBLIC ${CMAKE_CURRENT_SOURCE_DIR}/include/mxnet)
 if(USE_CUDA)
-  add_library(customop_gpu_lib SHARED ${CMAKE_CURRENT_SOURCE_DIR}/example/extensions/lib_custom_op/relu_lib.cu)
+  add_library(customop_gpu_lib SHARED ${CMAKE_CURRENT_SOURCE_DIR}/example/extensions/lib_custom_op/relu_lib.cu ${CMAKE_CURRENT_SOURCE_DIR}/src/lib_api.cc)

Review comment:
       Is this all I was missing?
   https://github.com/apache/incubator-mxnet/commit/04e88fb3c3f9daaa2bf7448b65678a20eff929be#diff-af3b638bc2a3e6c650974192a53c7291L741-R741




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-mxnet] samskalicky edited a comment on pull request #18904: MXNet external operators

Posted by GitBox <gi...@apache.org>.
samskalicky edited a comment on pull request #18904:
URL: https://github.com/apache/incubator-mxnet/pull/18904#issuecomment-680613651


   Dear community members,
   
   After further analysis and help from many people, I have decided that the best way forward is to put this work on hold while we spend some more time improving the C++ APIs of both MXNet and our dependencies (nnvm, dmlc-core). Once we have a stable, maintainable, versioned set of C++ APIs and a consistent process to build and link libmxnet.so with external libraries we'll revisit this proposal. 
   
   The major problem we ran into was that we couldnt decide to expose all symbols in libmxnet.so, so to be able to link a custom library with external operators in it we need to export only the specific symbols needed to build an operator (or other external components). And some of these symbols are coming from our dependencies like NNVM (ie. `NNVM_REGISTER_OP`). So we will need to ensure these symbols are exported from those projects as well. This will mean building a libnnvm.so and linking it with libmxnet.so so that it can be more easily maintained. Custom libraries will then be able to link against libnnvm.so and  libmxnet.so. 
   
   The other big problem was that we found that in order to compile an external operator you really had to use the main MXNet CMakeLists.txt to generate all the defines/includes/etc so that you can compile your library correctly. This was a huge pain, and ideally we would have a small set of includes that are opaque (ie. dont further include all of MXNet's includes) to use instead. 
   
   So we'll work towards both of these goals (symbol handling and includes) to reconfigure MXNet and its dependencies to have a cleaner C++ API. Once we have that then it will be easier to dynamically load libraries with external ops. 
   
   In the meantime, some of the refactoring work will continue on in another PR #19016. I will close this PR for now. Thanks for all of your input on this proposal!
   
   Sam


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-mxnet] kpuatamazon commented on pull request #18904: [RFC] MXNet external operators

Posted by GitBox <gi...@apache.org>.
kpuatamazon commented on pull request #18904:
URL: https://github.com/apache/incubator-mxnet/pull/18904#issuecomment-674792591


   What sort of binary compatibility guarantees does MXNet make for this interface?  If the safe option is always to compile at the same time, I don't see much difference from including it in the build?  


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-mxnet] samskalicky commented on pull request #18904: [RFC] MXNet external operators

Posted by GitBox <gi...@apache.org>.
samskalicky commented on pull request #18904:
URL: https://github.com/apache/incubator-mxnet/pull/18904#issuecomment-677405535


   > It's a good idea but we should do this very carefully to avoid the potential conflict from different binaries, like different omp.
   
   Hi @pengzhao-intel we are linking against libmxnet.so when building the external libraries.
   ```
   g++ -shared -fPIC -std=c++11 init_lib.cc min_ex.cc.o ../../../src/lib_api.cc \
           -o libmin_ex.so -I../../../include -L../../../build -lmxnet
   ```
   So there is some expectation that that the library plays nice with whatever is in MXNet (ie. OpenMP). Is there something else im missing? What use-cases around OpenMP should we consider that might cause conflicts?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-mxnet] samskalicky commented on pull request #18904: [WIP] MXNet external operators

Posted by GitBox <gi...@apache.org>.
samskalicky commented on pull request #18904:
URL: https://github.com/apache/incubator-mxnet/pull/18904#issuecomment-674552661


   > @samskalicky thanks for starting on this! does it supersede the custom c++ ops?
   
   No, I think C++ custom ops are still useful for those not familiar with MXNet backend or who dont want to build MXNet from source. C++ custom ops are the improvement on Python custom ops, easy to write, easy to use, almost as fast as built-in ops.
   
   External ops will be for those willing to build MXNet from source, or who need to absolute highest performance for the op. In terms of performance we're talking about ops with a short compute time (ie. less than 10ms) where the overhead of the C++ custom op is not tolerable. For ops with lots of computation, the perf will be bounded by the compute and the C++ custom op overhead wont be the bottleneck. 
   
   Im working on a project (at some point soon i'll be able to share more details) using C++ custom ops for subgraphs, and able to achieve performance improvements over built-in MXNet ops. So theres still motivation to keep them for their simplicity. 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-mxnet] pengzhao-intel commented on pull request #18904: [RFC] MXNet external operators

Posted by GitBox <gi...@apache.org>.
pengzhao-intel commented on pull request #18904:
URL: https://github.com/apache/incubator-mxnet/pull/18904#issuecomment-677384497


   > Hey folks,
   > 
   > I was playing around and realized we can load anything we want following the example, not just operators. For example, I wrote this little function to dump out the registered operators and map the aliases:
   
   > Sam
   
   It's a good idea but we should do this very carefully to avoid the potential conflict from different binaries, like different omp.
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-mxnet] samskalicky commented on a change in pull request #18904: [RFC] MXNet external operators

Posted by GitBox <gi...@apache.org>.
samskalicky commented on a change in pull request #18904:
URL: https://github.com/apache/incubator-mxnet/pull/18904#discussion_r474732108



##########
File path: CMakeLists.txt
##########
@@ -708,18 +708,18 @@ endif()
 target_compile_definitions(mxnet PUBLIC DMLC_LOG_FATAL_THROW=$<BOOL:${LOG_FATAL_THROW}>)
 
 # extension libraries (custom operators, custom subgraphs) are built by default
-add_library(customop_lib SHARED ${CMAKE_CURRENT_SOURCE_DIR}/example/extensions/lib_custom_op/gemm_lib.cc)
-add_library(transposecsr_lib SHARED ${CMAKE_CURRENT_SOURCE_DIR}/example/extensions/lib_custom_op/transposecsr_lib.cc)
-add_library(transposerowsp_lib SHARED ${CMAKE_CURRENT_SOURCE_DIR}/example/extensions/lib_custom_op/transposerowsp_lib.cc)
-add_library(subgraph_lib SHARED ${CMAKE_CURRENT_SOURCE_DIR}/example/extensions/lib_subgraph/subgraph_lib.cc)
-add_library(pass_lib SHARED ${CMAKE_CURRENT_SOURCE_DIR}/example/extensions/lib_pass/pass_lib.cc)
+add_library(customop_lib SHARED ${CMAKE_CURRENT_SOURCE_DIR}/example/extensions/lib_custom_op/gemm_lib.cc ${CMAKE_CURRENT_SOURCE_DIR}/src/lib_api.cc)
+add_library(transposecsr_lib SHARED ${CMAKE_CURRENT_SOURCE_DIR}/example/extensions/lib_custom_op/transposecsr_lib.cc ${CMAKE_CURRENT_SOURCE_DIR}/src/lib_api.cc)
+add_library(transposerowsp_lib SHARED ${CMAKE_CURRENT_SOURCE_DIR}/example/extensions/lib_custom_op/transposerowsp_lib.cc ${CMAKE_CURRENT_SOURCE_DIR}/src/lib_api.cc)
+add_library(subgraph_lib SHARED ${CMAKE_CURRENT_SOURCE_DIR}/example/extensions/lib_subgraph/subgraph_lib.cc ${CMAKE_CURRENT_SOURCE_DIR}/src/lib_api.cc)
+add_library(pass_lib SHARED ${CMAKE_CURRENT_SOURCE_DIR}/example/extensions/lib_pass/pass_lib.cc ${CMAKE_CURRENT_SOURCE_DIR}/src/lib_api.cc)
 target_include_directories(customop_lib PUBLIC ${CMAKE_CURRENT_SOURCE_DIR}/include/mxnet)
 target_include_directories(transposecsr_lib PUBLIC ${CMAKE_CURRENT_SOURCE_DIR}/include/mxnet)
 target_include_directories(transposerowsp_lib PUBLIC ${CMAKE_CURRENT_SOURCE_DIR}/include/mxnet)
 target_include_directories(subgraph_lib PUBLIC ${CMAKE_CURRENT_SOURCE_DIR}/include/mxnet)
 target_include_directories(pass_lib PUBLIC ${CMAKE_CURRENT_SOURCE_DIR}/include/mxnet)
 if(USE_CUDA)
-  add_library(customop_gpu_lib SHARED ${CMAKE_CURRENT_SOURCE_DIR}/example/extensions/lib_custom_op/relu_lib.cu)
+  add_library(customop_gpu_lib SHARED ${CMAKE_CURRENT_SOURCE_DIR}/example/extensions/lib_custom_op/relu_lib.cu ${CMAKE_CURRENT_SOURCE_DIR}/src/lib_api.cc)

Review comment:
       I found [this](https://discuss.mxnet.io/t/windows-build-libmxnet-dll-with-md-multithreaded-dll-runtime/1516):
   >  mxnet (and all its third parties) are configured to use the static runtime library (/MT), whereas the default behavior of Visual Studio and cmake it to use the dynamic runtime library (aka /MD or “MultiThreaded DLL”).
   
   So I needed to tell windows `cl.exe` to compile with "MT":
   ```
   target_compile_options(customop_gpu_lib PRIVATE "$<$<COMPILE_LANGUAGE:CUDA>:-Xcompiler=-LD -MT>")
   ```
   




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-mxnet] szha commented on pull request #18904: [RFC] MXNet external operators

Posted by GitBox <gi...@apache.org>.
szha commented on pull request #18904:
URL: https://github.com/apache/incubator-mxnet/pull/18904#issuecomment-677419459


   @samskalicky we've had the situation before where the libmxnet was linked with gnu omp and libmkldnn was linked to intel omp (or might have been the other way around).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-mxnet] szha commented on pull request #18904: [RFC] MXNet external operators

Posted by GitBox <gi...@apache.org>.
szha commented on pull request #18904:
URL: https://github.com/apache/incubator-mxnet/pull/18904#issuecomment-680248692


   I think we should expose the MX/NN symbols, unless there's a downside to it.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-mxnet] samskalicky commented on pull request #18904: [RFC] MXNet external operators

Posted by GitBox <gi...@apache.org>.
samskalicky commented on pull request #18904:
URL: https://github.com/apache/incubator-mxnet/pull/18904#issuecomment-675956750


   Hey folks,
   
   I was playing around and realized we can load anything we want following the example, not just operators. For example, I wrote this little function to dump out the registered operators and map the aliases:
   ```
   #include "operator_common.h"
   
   extern "C" int listOps() {
     // get op registry
     ::dmlc::Registry<::nnvm::Op>* reg = ::dmlc::Registry<::nnvm::Op>::Get();
     // get list of registered op names
     std::vector<std::string> ops = reg->ListAllNames();
   
     // create inverse map of Op to name (to find aliases)
     std::map<const ::nnvm::Op*,std::vector<std::string> > op_map;
     for(auto &name : ops) {
       const ::nnvm::Op* op = reg->Find(name);
       if(op_map.count(op) > 0) {
           if(name.compare(op->name) != 0)
             op_map[op].push_back(name);
       } else {
           op_map[op]={};
           if(name.compare(op->name) != 0)
             op_map[op].push_back(name);
       }
     }
   
     // print out the op mapping
     for(auto &kv : op_map) {
       std::cout << kv.first->name << ", ";
       for(auto &n : kv.second)
         std::cout << n << ", ";
       std::cout << std::endl;
     }
     
     return 0;
   }
   ```
   I compiled this code with MXNet to get an object file, and then built that into a shared library:
   ```
   g++ -shared -fPIC -std=c++11 init_lib.cc op_info.cc.o ../external_ops/src/lib_api.cc -o libop_info.so -I../external_ops/include -L../external_ops/build -lmxnet
   ```
   Then, with this Python code I can load my library into MXNet and at the same time get a handle to the library directly and call my function (now with access to the libmxnet.so):
   ```
   import os
   import ctypes
   import mxnet as mx
   mx.library.load(os.path.abspath('libop_info.so'))
   libop_info = ctypes.CDLL('libop_info.so')
   libop_info.listOps()
   ```
   I was able to get this list of operators out: https://docs.google.com/spreadsheets/d/1BK2UBJW7NFicriYGRpFwZHFi1HYqXwPTTzaY4jSKaWs/edit#gid=0
   
   So, should we consider making this proposal more than just "external operators"? Maybe "external components" that could be anything: operators, subgraph properties, utilities, etc....Thoughts?
   
   Sam


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-mxnet] samskalicky commented on pull request #18904: [RFC] MXNet external operators

Posted by GitBox <gi...@apache.org>.
samskalicky commented on pull request #18904:
URL: https://github.com/apache/incubator-mxnet/pull/18904#issuecomment-675001292


   > What about cmake subprojects? As in use MXNet as a subproject of the external library.
   
   Im not familiar with cmake subprojects. Would you be able to put together a prototype of this?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-mxnet] samskalicky closed pull request #18904: [RFC] MXNet external operators

Posted by GitBox <gi...@apache.org>.
samskalicky closed pull request #18904:
URL: https://github.com/apache/incubator-mxnet/pull/18904


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-mxnet] szha commented on a change in pull request #18904: [RFC] MXNet external operators

Posted by GitBox <gi...@apache.org>.
szha commented on a change in pull request #18904:
URL: https://github.com/apache/incubator-mxnet/pull/18904#discussion_r472696254



##########
File path: CMakeLists.txt
##########
@@ -708,18 +708,18 @@ endif()
 target_compile_definitions(mxnet PUBLIC DMLC_LOG_FATAL_THROW=$<BOOL:${LOG_FATAL_THROW}>)
 
 # extension libraries (custom operators, custom subgraphs) are built by default
-add_library(customop_lib SHARED ${CMAKE_CURRENT_SOURCE_DIR}/example/extensions/lib_custom_op/gemm_lib.cc)
-add_library(transposecsr_lib SHARED ${CMAKE_CURRENT_SOURCE_DIR}/example/extensions/lib_custom_op/transposecsr_lib.cc)
-add_library(transposerowsp_lib SHARED ${CMAKE_CURRENT_SOURCE_DIR}/example/extensions/lib_custom_op/transposerowsp_lib.cc)
-add_library(subgraph_lib SHARED ${CMAKE_CURRENT_SOURCE_DIR}/example/extensions/lib_subgraph/subgraph_lib.cc)
-add_library(pass_lib SHARED ${CMAKE_CURRENT_SOURCE_DIR}/example/extensions/lib_pass/pass_lib.cc)
+add_library(customop_lib SHARED ${CMAKE_CURRENT_SOURCE_DIR}/example/extensions/lib_custom_op/gemm_lib.cc ${CMAKE_CURRENT_SOURCE_DIR}/src/lib_api.cc)
+add_library(transposecsr_lib SHARED ${CMAKE_CURRENT_SOURCE_DIR}/example/extensions/lib_custom_op/transposecsr_lib.cc ${CMAKE_CURRENT_SOURCE_DIR}/src/lib_api.cc)
+add_library(transposerowsp_lib SHARED ${CMAKE_CURRENT_SOURCE_DIR}/example/extensions/lib_custom_op/transposerowsp_lib.cc ${CMAKE_CURRENT_SOURCE_DIR}/src/lib_api.cc)
+add_library(subgraph_lib SHARED ${CMAKE_CURRENT_SOURCE_DIR}/example/extensions/lib_subgraph/subgraph_lib.cc ${CMAKE_CURRENT_SOURCE_DIR}/src/lib_api.cc)
+add_library(pass_lib SHARED ${CMAKE_CURRENT_SOURCE_DIR}/example/extensions/lib_pass/pass_lib.cc ${CMAKE_CURRENT_SOURCE_DIR}/src/lib_api.cc)
 target_include_directories(customop_lib PUBLIC ${CMAKE_CURRENT_SOURCE_DIR}/include/mxnet)
 target_include_directories(transposecsr_lib PUBLIC ${CMAKE_CURRENT_SOURCE_DIR}/include/mxnet)
 target_include_directories(transposerowsp_lib PUBLIC ${CMAKE_CURRENT_SOURCE_DIR}/include/mxnet)
 target_include_directories(subgraph_lib PUBLIC ${CMAKE_CURRENT_SOURCE_DIR}/include/mxnet)
 target_include_directories(pass_lib PUBLIC ${CMAKE_CURRENT_SOURCE_DIR}/include/mxnet)
 if(USE_CUDA)
-  add_library(customop_gpu_lib SHARED ${CMAKE_CURRENT_SOURCE_DIR}/example/extensions/lib_custom_op/relu_lib.cu)
+  add_library(customop_gpu_lib SHARED ${CMAKE_CURRENT_SOURCE_DIR}/example/extensions/lib_custom_op/relu_lib.cu ${CMAKE_CURRENT_SOURCE_DIR}/src/lib_api.cc)

Review comment:
       @yajiedesign specializes in black magic.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-mxnet] szha commented on pull request #18904: [RFC] MXNet external operators

Posted by GitBox <gi...@apache.org>.
szha commented on pull request #18904:
URL: https://github.com/apache/incubator-mxnet/pull/18904#issuecomment-675677054


   @samskalicky agreed blocklisting will be hard to maintain as the integrations may change. another way within our control is to require a naming convention for this external operator feature.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-mxnet] samskalicky commented on pull request #18904: [RFC] MXNet external operators

Posted by GitBox <gi...@apache.org>.
samskalicky commented on pull request #18904:
URL: https://github.com/apache/incubator-mxnet/pull/18904#issuecomment-680613651


   Dear community members,
   
   After further analysis and help from many people, I have decided that the best way forward is to put this work on hold while we spend some more time improving the C++ APIs of both MXNet and our dependencies (nnvm, dmlc-core). Once we have a stable, maintainable, versioned set of C++ APIs and a consistent process to build and link libmxnet.so with external libraries we'll revisit this proposal. 
   
   In the meantime, some of the refactoring work will continue on in another PR #19016. I will close this PR for now. Thanks for all of your input on this proposal!
   
   Sam


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-mxnet] samskalicky commented on pull request #18904: [RFC] MXNet external operators

Posted by GitBox <gi...@apache.org>.
samskalicky commented on pull request #18904:
URL: https://github.com/apache/incubator-mxnet/pull/18904#issuecomment-675119119


   > Since mxnet 2.0 generally aims to simplify the build system, shouldn't we take this case as incentive to simplify it further and remove the complexity that is currently blocking you on this matter? Maybe @pengzhao-intel and his team as well as @ptrendx and his team could help here.
   > The complexity of our build system and dependency.closure are creating various issues on different topics, so finding a proper solution that goes above "#if" and compile time.options would he helpful for the project.
   > 
   
   > What about cmake subprojects? As in use MXNet as a subproject of the external library.
   >
   
   Following up on this after an offline discussion with @leezu, all of the code change in this PR so far is to enable dynamically loading libraries containing operator implementations with our existing library loading capabilities in MXNet (ie. `MXLoadLib` and `mx.library.load`). How users build the operator object files to link into a shared library is totally separate. Whether they put the operator code in the **src/operator** directory and build or if they improve the existing MXNet CMakeLists.txt to enable building as a subproject they get the same library thats loaded via the code in this PR. 
   
   As was mentioned before, cleaning up the CMakeLists.txt can be done in parallel or after this PR is merged. How would guys feel about taking that approach?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-mxnet] samskalicky commented on a change in pull request #18904: [RFC] MXNet external operators

Posted by GitBox <gi...@apache.org>.
samskalicky commented on a change in pull request #18904:
URL: https://github.com/apache/incubator-mxnet/pull/18904#discussion_r474729746



##########
File path: CMakeLists.txt
##########
@@ -160,6 +160,7 @@ if("$ENV{VERBOSE}" STREQUAL "1")
   message(STATUS " Verbose Makefile ACTIVATED")
   set(CMAKE_VERBOSE_MAKEFILE ON)
 endif()
+set(CMAKE_VERBOSE_MAKEFILE ON)

Review comment:
       removed




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-mxnet] leezu commented on a change in pull request #18904: [RFC] MXNet external operators

Posted by GitBox <gi...@apache.org>.
leezu commented on a change in pull request #18904:
URL: https://github.com/apache/incubator-mxnet/pull/18904#discussion_r476058946



##########
File path: config/linux_gpu.cmake
##########
@@ -33,6 +33,14 @@
 # Specify `cmake --build . --parallel N` to set the number of parallel compilation jobs.
 # Default is derived from CPUs available.
 #
+# By default, cmake will try and discover which GPU architecture to use by looking at
+# the available GPUs on the machine that you're building on. If you want to build for
+# a specific GPU architecture or are building on a machine without a GPU, then
+# specify the MXNET_CUDA_ARCH option like:
+#
+# $ cmake .. -DMXNET_CUDA_ARCH=7.0
+#
+# In the example above we're building for sm_70 which is the Volta architecture.

Review comment:
       To avoid duplication, should we just reference the option and explanation 20 lines below.If you find the explanation below isn't clear, feel free to improve it. It may be better than documenting it twice. WDYT? 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-mxnet] samskalicky commented on pull request #18904: [RFC] MXNet external operators

Posted by GitBox <gi...@apache.org>.
samskalicky commented on pull request #18904:
URL: https://github.com/apache/incubator-mxnet/pull/18904#issuecomment-679737968


   > How about components in general? Let's say we would like to externalize the onnx support or some other component? We would still always have to build these things together and specify the features set during compile time, right?
   
   It depends on what "externalize the onnx support" entails. Today the only components we register are: operators, subgraph properties, and graph passes (anything else?). The current onnx support is implemented [entirely in Python](https://github.com/apache/incubator-mxnet/tree/master/python/mxnet/contrib/onnx). 
   
   Maybe we could externalize TensorRT as a custom subgraph property & custom op though.
   
   You dont need to build them together as much as you need to reproduce the same build settings (defines, includes, etc). 
   
   The big reason that you cant just compile your code and link against libmxnet.so is the same as why you cant build a custom application using the C++ API without compiling it with MXNet: it depends on the entire codebase (+3rd party submodules). 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-mxnet] samskalicky commented on a change in pull request #18904: [RFC] MXNet external operators

Posted by GitBox <gi...@apache.org>.
samskalicky commented on a change in pull request #18904:
URL: https://github.com/apache/incubator-mxnet/pull/18904#discussion_r476111032



##########
File path: config/linux_gpu.cmake
##########
@@ -30,6 +30,10 @@
 #  $ cmake ..
 #  $ cmake --build .
 #
+#  or you can specify the particular GPU architecture by
+#
+#  $ cmake .. -DMXNET_CUDA_ARCH=7.0

Review comment:
       @leezu how about a compromise, a short example at the top?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-mxnet] kpuatamazon commented on pull request #18904: [RFC] MXNet external operators

Posted by GitBox <gi...@apache.org>.
kpuatamazon commented on pull request #18904:
URL: https://github.com/apache/incubator-mxnet/pull/18904#issuecomment-674961787


   I think this would be much cleaner if it was a separate directory because:
   
   - Version control is much easier.  Just update mxnet or `rm -rf` it without extra cruft
   - This reflects reality much better.  Somebody else builds mxnet for `pip` without knowing about my stuff. MXNet ships a docker in which I can build my thing for binary compatibility.  
   - MXNet should be usable as a submodule.  
   
   So my ideal instructions look more like
   
   1. compile MXNet normally from a clean checkout
   2. cd into your own project, configure with `-DMXNET=/path/to/mxnet` and compile.  Sample project provided and part of integration tests.  
   3. `my_op.so` built by my build system
   4. dynamically load shared library into MXNet via `mx.library.load()`


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-mxnet] Zha0q1 commented on pull request #18904: [RFC] MXNet external operators

Posted by GitBox <gi...@apache.org>.
Zha0q1 commented on pull request #18904:
URL: https://github.com/apache/incubator-mxnet/pull/18904#issuecomment-675794281


   > > Regarding the open question "What symbols do we need to make available in libmxnet.so for external operators that might be stripped out?", note that https://github.com/apache/incubator-mxnet/blob/master/cmake/libmxnet.sym will hide all non-whitelisted symbols and @Zha0q1 intends to enable it for all builds to avoid name clashing with ILP64 BLAS.
   > 
   > Makes sense, probably the best way to merge these two features is to just explicitly exclude the symbols causing the clashing with ILP64 BLAS. The inverse of deciding which symbols in mxnet to strip out while still leaving the ones users intend to use for external ops might be more difficult to do and maintain.
   
   Do you mean something like this: https://github.com/Zha0q1/incubator-mxnet/blob/static_openblas/cmake/exclude_openblas.ver?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-mxnet] marcoabreu commented on pull request #18904: [RFC] MXNet external operators

Posted by GitBox <gi...@apache.org>.
marcoabreu commented on pull request #18904:
URL: https://github.com/apache/incubator-mxnet/pull/18904#issuecomment-679707765


   So that does mean a workflow where people pip install mxnet and then build their own operator on top will not be possible? They will still have to maintain a fork, although the separation is a bit clearer now?
   
   I think we can draw a lot of benefits if one does not have to compile mxnet itself as that would allow us to start working with optional features as separately disturbed and maintained components. Right now there seems to be a very tight coupling and I understand where it comes from, but the question is whether we see the vision as valuable and if we can work out a plan that would enable that way. I understand that the current build system is not built with that in mind, but mxnet 2.0 would give us the opportunity to break some things to pave the path.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-mxnet] samskalicky commented on pull request #18904: [RFC] MXNet external operators

Posted by GitBox <gi...@apache.org>.
samskalicky commented on pull request #18904:
URL: https://github.com/apache/incubator-mxnet/pull/18904#issuecomment-680301323


   > @samskalicky, can clarify which symbols are needed for the external operators you are interested in?
   
   Not sufficiently, anything that anybody uses to write an internal/backend operator could be anything in MXNet/NNVM/TVM/mshadow/etc... so if the goal of the feature is enable any backend operator to be dynamically loaded, we have to export all symbols.
   
   The biggest problem with individual symbol exposure many symbols needed arent entirely defined in MXNet source code. One example is that the operator registration macro `NNVM_REGISTER_OP` resolves to something that calls `::dmlc::Registry<::nnvm::Op>::Get()` which is defined in **3rdparty/tvm/nnvm/src/core/op.cc**. So unless we go and modify that 3rd party code we're stuck.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-mxnet] samskalicky commented on a change in pull request #18904: [RFC] MXNet external operators

Posted by GitBox <gi...@apache.org>.
samskalicky commented on a change in pull request #18904:
URL: https://github.com/apache/incubator-mxnet/pull/18904#discussion_r476023043



##########
File path: config/linux_gpu.cmake
##########
@@ -33,6 +33,14 @@
 # Specify `cmake --build . --parallel N` to set the number of parallel compilation jobs.
 # Default is derived from CPUs available.
 #
+# By default, cmake will try and discover which GPU architecture to use by looking at
+# the available GPUs on the machine that you're building on. If you want to build for
+# a specific GPU architecture or are building on a machine without a GPU, then
+# specify the MXNET_CUDA_ARCH option like:
+#
+# $ cmake .. -DMXNET_CUDA_ARCH=7.0
+#
+# In the example above we're building for sm_70 which is the Volta architecture.

Review comment:
       @leezu @szha heres the doc changes we discussed offline for GPU builds specifying the architecture manually instead of auto discovering




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-mxnet] samskalicky commented on pull request #18904: [RFC] MXNet external operators

Posted by GitBox <gi...@apache.org>.
samskalicky commented on pull request #18904:
URL: https://github.com/apache/incubator-mxnet/pull/18904#issuecomment-679714368


   > So that does mean a workflow where people pip install mxnet and then build their own operator on top will not be possible? They will still have to maintain a fork, although the separation is a bit clearer now?
   > 
   > I think we can draw a lot of benefits if one does not have to compile mxnet itself as that would allow us to start working with optional features as separately disturbed and maintained components. Right now there seems to be a very tight coupling and I understand where it comes from, but the question is whether we see the vision as valuable and if we can work out a plan that would enable that way. I understand that the current build system is not built with that in mind, but mxnet 2.0 would give us the opportunity to break some things to pave the path.
   
   Yes, external operators will not be possible this way. If you want to not build MXNet you can use [C++ Custom Ops](https://github.com/apache/incubator-mxnet/tree/master/example/extensions/lib_custom_op) (but not external ops)


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-mxnet] samskalicky commented on pull request #18904: [RFC] MXNet external operators

Posted by GitBox <gi...@apache.org>.
samskalicky commented on pull request #18904:
URL: https://github.com/apache/incubator-mxnet/pull/18904#issuecomment-679528549


   > So my ideal instructions look more like
   > 
   > 1. compile MXNet normally from a clean checkout
   > 2. cd into your own project, configure with `-DMXNET=/path/to/mxnet` and compile.  Sample project provided and part of integration tests.
   > 3. `my_op.so` built by my build system
   > 4. dynamically load shared library into MXNet via `mx.library.load()`
   
   @kpuatamazon Heres another idea. In order to build files that used to be in **src/operator** for example, we need to have all the defines, includes, etc that is normally set in the MXNet build:
   ```
   [ 98%] Building CXX object CMakeFiles/external_lib.dir/example/extensions/lib_external_ops/min_ex.cc.o
   /usr/bin/ccache /usr/bin/c++  -DDMLC_CORE_USE_CMAKE -DDMLC_LOG_FATAL_THROW=1 -DDMLC_LOG_STACK_TRACE_SIZE=0 -DDMLC_MODERN_THREAD_LOCAL=0 -DDMLC_STRICT_CXX11 -DDMLC_USE_CXX11 -DDMLC_USE_CXX11=1 -DDMLC_USE_CXX14 -DMSHADOW_INT64_TENSOR_SIZE=0 -DMSHADOW_IN_CXX11 -DMSHADOW_USE_CBLAS=1 -DMSHADOW_USE_CUDA=0 -DMSHADOW_USE_MKL=0 -DMSHADOW_USE_SSE -DMXNET_USE_BLAS_OPEN=1 -DMXNET_USE_LAPACK=1 -DMXNET_USE_LIBJPEG_TURBO=0 -DMXNET_USE_MKLDNN=1 -DMXNET_USE_OPENCV=1 -DMXNET_USE_OPENMP=1 -DMXNET_USE_OPERATOR_TUNING=1 -DMXNET_USE_SIGNAL_HANDLER=1 -DNDEBUG=1 -D__USE_XOPEN2K8 -Dexternal_lib_EXPORTS -I/home/ubuntu/external_ops/3rdparty/mkldnn/include -I/home/ubuntu/external_ops/build/3rdparty/mkldnn/include -I/home/ubuntu/external_ops/include -I/home/ubuntu/external_ops/src -I/home/ubuntu/external_ops/3rdparty/tvm/nnvm/include -I/home/ubuntu/external_ops/3rdparty/tvm/include -I/home/ubuntu/external_ops/3rdparty/dmlc-core/include -I/home/ubuntu/external_ops/3rdparty/dlpack/include -I/home/ubuntu/externa
 l_ops/include/mxnet -I/home/ubuntu/external_ops/3rdparty/mshadow -I/home/ubuntu/external_ops/3rdparty/mkldnn/src/../include -I/home/ubuntu/external_ops/build/3rdparty/dmlc-core/include -isystem /usr/local/include -isystem /usr/local/include/opencv4  -Wall -Wno-sign-compare -O3 -fopenmp -fPIC   -Wno-unused-parameter -Wno-unknown-pragmas -Wno-unused-local-typedefs -msse3 -mf16c -std=gnu++1z -o CMakeFiles/external_lib.dir/example/extensions/lib_external_ops/min_ex.cc.o -c /home/ubuntu/external_ops/example/extensions/lib_external_ops/min_ex.cc
   [ 98%] Linking CXX shared library libexternal_lib.so
   /usr/local/lib/python3.6/dist-packages/cmake/data/bin/cmake -E cmake_link_script CMakeFiles/external_lib.dir/link.txt --verbose=1
   /usr/bin/c++ -fPIC   -Wall -Wno-sign-compare -O3 -fopenmp  -shared -Wl,-soname,libexternal_lib.so -o libexternal_lib.so CMakeFiles/external_lib.dir/example/extensions/lib_external_ops/init_lib.cc.o CMakeFiles/external_lib.dir/example/extensions/lib_external_ops/min_ex.cc.o CMakeFiles/external_lib.dir/src/lib_api.cc.o -Wl,-rpath,/home/ubuntu/external_ops/build:/usr/local/lib:/home/ubuntu/external_ops/build/3rdparty/openmp/runtime/src libmxnet.so 3rdparty/mkldnn/src/libdnnl.a /usr/local/lib/libopenblas.so /usr/lib/x86_64-linux-gnu/librt.so /usr/local/lib/libopencv_highgui.so.4.2.0 /usr/local/lib/libopencv_videoio.so.4.2.0 /usr/local/lib/libopencv_imgcodecs.so.4.2.0 /usr/local/lib/libopencv_imgproc.so.4.2.0 /usr/local/lib/libopencv_core.so.4.2.0 3rdparty/openmp/runtime/src/libomp.so -ldl -lpthread -lpthread -llapack 3rdparty/dmlc-core/libdmlc.a /usr/lib/gcc/x86_64-linux-gnu/7/libgomp.so -lpthread -lrt 
   ```
   After all, the use case for this feature is to be able to use custom components in a publicly released build of MXNet (without those components compiled in statically). So initially, those components were build right as part of an MXNet build at some point. So they will need to continue being built the same way. With all of MXNet's complicated layout (includes all over the place, not just in **include/mxnet** directory) and 3rd submodules its not currently possible to just build MXNet and link it against a custom library. The hardest part is figuring out all the defines/includes/etc to compile a library with. 
   
   In this new idea, theres a build target setup called "external_lib" in MXNet's CMakeLists.txt (just like we already have for other extensions examples like "pass_lib") that compiles all *.cc and *.cu files in the **example/extensions/lib_external_ops** directory. After copying your custom files into the **lib_external_ops** directory you can just go and run `cmake` and `make` to build your custom library. 
   
   Ive updated the README for the example as:
   https://github.com/apache/incubator-mxnet/pull/18904/files#diff-70cbaa0e978356ecb01db8beb907ab48R31-R38
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-mxnet] samskalicky commented on a change in pull request #18904: [RFC] MXNet external operators

Posted by GitBox <gi...@apache.org>.
samskalicky commented on a change in pull request #18904:
URL: https://github.com/apache/incubator-mxnet/pull/18904#discussion_r474245887



##########
File path: CMakeLists.txt
##########
@@ -160,6 +160,7 @@ if("$ENV{VERBOSE}" STREQUAL "1")
   message(STATUS " Verbose Makefile ACTIVATED")
   set(CMAKE_VERBOSE_MAKEFILE ON)
 endif()
+set(CMAKE_VERBOSE_MAKEFILE ON)

Review comment:
       sure, this is just debugging the windows build issues on GPU




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-mxnet] leezu commented on a change in pull request #18904: [RFC] MXNet external operators

Posted by GitBox <gi...@apache.org>.
leezu commented on a change in pull request #18904:
URL: https://github.com/apache/incubator-mxnet/pull/18904#discussion_r474239441



##########
File path: CMakeLists.txt
##########
@@ -160,6 +160,7 @@ if("$ENV{VERBOSE}" STREQUAL "1")
   message(STATUS " Verbose Makefile ACTIVATED")
   set(CMAKE_VERBOSE_MAKEFILE ON)
 endif()
+set(CMAKE_VERBOSE_MAKEFILE ON)

Review comment:
       Does specifying `VERBOSE=1` when running `make`, or `-v` when running ninja not work?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-mxnet] samskalicky commented on pull request #18904: [RFC] MXNet external operators

Posted by GitBox <gi...@apache.org>.
samskalicky commented on pull request #18904:
URL: https://github.com/apache/incubator-mxnet/pull/18904#issuecomment-674967821


   > I think this would be much cleaner if it was a separate directory because:
   > 
   > * Version control is much easier.  Just update mxnet or `rm -rf` it without extra cruft
   > * This reflects reality much better.  Somebody else builds mxnet for `pip` without knowing about my stuff. MXNet ships a docker in which I can build my thing for binary compatibility.
   > * MXNet should be usable as a submodule.
   > 
   > So my ideal instructions look more like
   > 
   > 1. compile MXNet normally from a clean checkout
   > 2. cd into your own project, configure with `-DMXNET=/path/to/mxnet` and compile.  Sample project provided and part of integration tests.
   > 3. `my_op.so` built by my build system
   > 4. dynamically load shared library into MXNet via `mx.library.load()`
   
   Agreed, if only MXNet codebase was better organized. We have headers/includes scattered throughout the codebase. Not just in *include/mxnet**. For example `mxnet_op.h` in in **src/operator** and [`include/mshadow/base.h` includes `mkl_blas.h`](https://github.com/apache/incubator-mxnet/blob/2610c10701c2b8155dbf094aaecba37ebbf67d0f/3rdparty/mshadow/mshadow/base.h#L173) which isnt in the MXNet codebase. Duplicating and reproducing the MXNet cmake flow for building custom operators is not worth the hassle/maintenance. 
   
   If others have ideas that they wanna give it a whirl, feel free to checkout this branch and try building `min_ex.cc` in the `lib_external_ops` directory. Happy to collaborate on this. 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-mxnet] leezu commented on a change in pull request #18904: [RFC] MXNet external operators

Posted by GitBox <gi...@apache.org>.
leezu commented on a change in pull request #18904:
URL: https://github.com/apache/incubator-mxnet/pull/18904#discussion_r474239441



##########
File path: CMakeLists.txt
##########
@@ -160,6 +160,7 @@ if("$ENV{VERBOSE}" STREQUAL "1")
   message(STATUS " Verbose Makefile ACTIVATED")
   set(CMAKE_VERBOSE_MAKEFILE ON)
 endif()
+set(CMAKE_VERBOSE_MAKEFILE ON)

Review comment:
       Please remove this. Instead you should specify `VERBOSE=1` when running `make`, or `-v` when running ninja.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-mxnet] samskalicky commented on pull request #18904: [RFC] MXNet external operators

Posted by GitBox <gi...@apache.org>.
samskalicky commented on pull request #18904:
URL: https://github.com/apache/incubator-mxnet/pull/18904#issuecomment-680246548


   > Do you mean something like this: https://github.com/Zha0q1/incubator-mxnet/blob/static_openblas/cmake/exclude_openblas.ver?
   
   Thanks @Zha0q1, @szha for windows the default is not to export any symbols (according to this: https://stackoverflow.com/q/29038914). So currently im getting this error:
   ```
   in_ex.cc.obj : error LNK2019: unresolved external symbol "public: static class dmlc::Registry<class nnvm::Op> * __cdecl dmlc::Registry<class nnvm::Op>::Get(void)" (?Get@?$Registry@VOp@nnvm@@@dmlc@@SAPEAV12@XZ) referenced in function "void __cdecl mxnet::op::`dynamic initializer for '__make_NnvmOp_min_ex0''(void)" (??__E__make_NnvmOp_min_ex0@op@mxnet@@YAXXZ)
   ```
   This seems to stem from missing exported symbols for nnvm stuff. Do we want to expose the identical set of symbols between windows/linux? 
   
   If not, then we should have an identical library loading procedure (like C++ custom ops) where we re-register components from the library into MXNet's internal registry. But this will limit the types of components a user can put in their library by explicitly requiring MXNet to implement support for each (instead of just making the symbols exposed and allowing users to load libraries that just hook into those symbols). 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-mxnet] leezu commented on pull request #18904: [RFC] MXNet external operators

Posted by GitBox <gi...@apache.org>.
leezu commented on pull request #18904:
URL: https://github.com/apache/incubator-mxnet/pull/18904#issuecomment-680298551


   Being explicit about the exported / supported symbols and thus components is preferable, as otherwise it becomes unclear which symbols are tracked as part of semantic versioning and which aren't. In fact, instead of setting WINDOWS_EXPORT_ALL_SYMBOLS, the best practice is actually to make gcc follow the Windows default of hiding symbols by default and only exposing whitelisted symbols. You can refer to the CPPCon talk: https://crascit.com/wp-content/uploads/2019/09/Deep-CMake-For-Library-Authors-Craig-Scott-CppCon-2019.pdf https://www.youtube.com/watch?v=m0DwB4OvDXk
   
   Before proceeding here, I think we need to answer the question "What symbols do we need to make available in libmxnet.so for external operators?" @samskalicky, can clarify which symbols are needed for the external operators you are interested in?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-mxnet] samskalicky edited a comment on pull request #18904: [RFC] MXNet external operators

Posted by GitBox <gi...@apache.org>.
samskalicky edited a comment on pull request #18904:
URL: https://github.com/apache/incubator-mxnet/pull/18904#issuecomment-674967821


   > I think this would be much cleaner if it was a separate directory because:
   > 
   > * Version control is much easier.  Just update mxnet or `rm -rf` it without extra cruft
   > * This reflects reality much better.  Somebody else builds mxnet for `pip` without knowing about my stuff. MXNet ships a docker in which I can build my thing for binary compatibility.
   > * MXNet should be usable as a submodule.
   > 
   > So my ideal instructions look more like
   > 
   > 1. compile MXNet normally from a clean checkout
   > 2. cd into your own project, configure with `-DMXNET=/path/to/mxnet` and compile.  Sample project provided and part of integration tests.
   > 3. `my_op.so` built by my build system
   > 4. dynamically load shared library into MXNet via `mx.library.load()`
   
   Agreed, if only MXNet codebase was better organized. We have headers/includes scattered throughout the codebase. Not just in **include/mxnet**. For example `mxnet_op.h` in in **src/operator** and [`include/mshadow/base.h` includes `mkl_blas.h`](https://github.com/apache/incubator-mxnet/blob/2610c10701c2b8155dbf094aaecba37ebbf67d0f/3rdparty/mshadow/mshadow/base.h#L173) which isnt in the MXNet codebase. Duplicating and reproducing the MXNet cmake flow for building custom operators is not worth the hassle/maintenance. 
   
   If others have ideas that they wanna give it a whirl, feel free to checkout this branch and try building `min_ex.cc` in the `lib_external_ops` directory. Happy to collaborate on this. 
   
   I added a `test` target in the Makefile in `lib_external_ops` to try and compile/link as you suggest:
   https://github.com/apache/incubator-mxnet/pull/18904/files#diff-8a8d486c6b362b11bec05cdd67b3c3bdR32-R33
   But currently its failing with:
   ```
   In file included from ../../../include/mshadow/tensor.h:35:0,
                    from ../../../include/mxnet/base.h:33,
                    from ../../../src/operator/mxnet_op.h:30,
                    from min_ex-inl.h:26,
                    from min_ex.cc:20:
   ../../../include/mshadow/./base.h:173:12: fatal error: mkl_blas.h: No such file or directory
      #include <mkl_blas.h>
               ^~~~~~~~~~~~
   ```
   Im happy to be wrong, but I think this is an indication of more to come in terms of managing a complex set of includes/dependencies that will constantly be changing. 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-mxnet] samskalicky commented on a change in pull request #18904: [RFC] MXNet external operators

Posted by GitBox <gi...@apache.org>.
samskalicky commented on a change in pull request #18904:
URL: https://github.com/apache/incubator-mxnet/pull/18904#discussion_r476063390



##########
File path: config/linux_gpu.cmake
##########
@@ -33,6 +33,14 @@
 # Specify `cmake --build . --parallel N` to set the number of parallel compilation jobs.
 # Default is derived from CPUs available.
 #
+# By default, cmake will try and discover which GPU architecture to use by looking at
+# the available GPUs on the machine that you're building on. If you want to build for
+# a specific GPU architecture or are building on a machine without a GPU, then
+# specify the MXNET_CUDA_ARCH option like:
+#
+# $ cmake .. -DMXNET_CUDA_ARCH=7.0
+#
+# In the example above we're building for sm_70 which is the Volta architecture.

Review comment:
       lol, ya i totally didnt see that below. Plus I dont think I would have understood what it was saying. I needed a dumbed down version like "heres how to compile for a specific GPU...". 
   
   Do we expect users will just build on the machine with the GPU that they intend to use? So the simple `cmake ..` is all they need? Who are these instructions for? If they were for me, they would say heres how to build for a specific GPU :D




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-mxnet] mxnet-bot commented on pull request #18904: [WIP] MXNet external operators

Posted by GitBox <gi...@apache.org>.
mxnet-bot commented on pull request #18904:
URL: https://github.com/apache/incubator-mxnet/pull/18904#issuecomment-672064501


   Hey @samskalicky , Thanks for submitting the PR 
   All tests are already queued to run once. If tests fail, you can trigger one or more tests again with the following commands: 
   - To trigger all jobs: @mxnet-bot run ci [all] 
   - To trigger specific jobs: @mxnet-bot run ci [job1, job2] 
   *** 
   **CI supported jobs**: [clang, website, sanity, unix-gpu, windows-gpu, centos-gpu, windows-cpu, edge, unix-cpu, miscellaneous, centos-cpu]
   *** 
   _Note_: 
    Only following 3 categories can trigger CI :PR Author, MXNet Committer, Jenkins Admin. 
   All CI tests must pass before the PR can be merged. 
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-mxnet] samskalicky commented on pull request #18904: [RFC] MXNet external operators

Posted by GitBox <gi...@apache.org>.
samskalicky commented on pull request #18904:
URL: https://github.com/apache/incubator-mxnet/pull/18904#issuecomment-674954144


   > What sort of binary compatibility guarantees does MXNet make for this interface? If the safe option is always to compile at the same time, I don't see much difference from including it in the build?
   
   This interface makes no binary compatibility guarantees. You would need to compile and link your operators agains the version of MXNet you intend to load your library into. Further, you would need to build it in the same environment as MXNet (gcc, glibc, etc).
   
   The main benefits of between external operators over built-in operators are:
   1. They do not need to be part of the MXNet codebase (no PR, no community approval, etc). This also lets you more rapidly innovate on your custom operators since you dont need it to be part of a formal MXNet release or go through the PR process. Bug fixes on your custom operators can happen in parallel outside of the MXNet community. 
   2. You can distribute your custom ops separately from MXNet. For example, if you already have another Python package that you distribute (ie. as a wheel) you can package your operators within that package and dynamically load them into MXNet.
   
   Of course, you can still always build and distribute your own fork of MXNet with your custom operators in it. 
   
   This style of external operator is similar to how other DL frameworks support custom operators. So if you're familiar with writing custom operators for TF, PT, etc this is exactly the same. 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-mxnet] samskalicky commented on a change in pull request #18904: [RFC] MXNet external operators

Posted by GitBox <gi...@apache.org>.
samskalicky commented on a change in pull request #18904:
URL: https://github.com/apache/incubator-mxnet/pull/18904#discussion_r472659420



##########
File path: CMakeLists.txt
##########
@@ -708,18 +708,18 @@ endif()
 target_compile_definitions(mxnet PUBLIC DMLC_LOG_FATAL_THROW=$<BOOL:${LOG_FATAL_THROW}>)
 
 # extension libraries (custom operators, custom subgraphs) are built by default
-add_library(customop_lib SHARED ${CMAKE_CURRENT_SOURCE_DIR}/example/extensions/lib_custom_op/gemm_lib.cc)
-add_library(transposecsr_lib SHARED ${CMAKE_CURRENT_SOURCE_DIR}/example/extensions/lib_custom_op/transposecsr_lib.cc)
-add_library(transposerowsp_lib SHARED ${CMAKE_CURRENT_SOURCE_DIR}/example/extensions/lib_custom_op/transposerowsp_lib.cc)
-add_library(subgraph_lib SHARED ${CMAKE_CURRENT_SOURCE_DIR}/example/extensions/lib_subgraph/subgraph_lib.cc)
-add_library(pass_lib SHARED ${CMAKE_CURRENT_SOURCE_DIR}/example/extensions/lib_pass/pass_lib.cc)
+add_library(customop_lib SHARED ${CMAKE_CURRENT_SOURCE_DIR}/example/extensions/lib_custom_op/gemm_lib.cc ${CMAKE_CURRENT_SOURCE_DIR}/src/lib_api.cc)
+add_library(transposecsr_lib SHARED ${CMAKE_CURRENT_SOURCE_DIR}/example/extensions/lib_custom_op/transposecsr_lib.cc ${CMAKE_CURRENT_SOURCE_DIR}/src/lib_api.cc)
+add_library(transposerowsp_lib SHARED ${CMAKE_CURRENT_SOURCE_DIR}/example/extensions/lib_custom_op/transposerowsp_lib.cc ${CMAKE_CURRENT_SOURCE_DIR}/src/lib_api.cc)
+add_library(subgraph_lib SHARED ${CMAKE_CURRENT_SOURCE_DIR}/example/extensions/lib_subgraph/subgraph_lib.cc ${CMAKE_CURRENT_SOURCE_DIR}/src/lib_api.cc)
+add_library(pass_lib SHARED ${CMAKE_CURRENT_SOURCE_DIR}/example/extensions/lib_pass/pass_lib.cc ${CMAKE_CURRENT_SOURCE_DIR}/src/lib_api.cc)
 target_include_directories(customop_lib PUBLIC ${CMAKE_CURRENT_SOURCE_DIR}/include/mxnet)
 target_include_directories(transposecsr_lib PUBLIC ${CMAKE_CURRENT_SOURCE_DIR}/include/mxnet)
 target_include_directories(transposerowsp_lib PUBLIC ${CMAKE_CURRENT_SOURCE_DIR}/include/mxnet)
 target_include_directories(subgraph_lib PUBLIC ${CMAKE_CURRENT_SOURCE_DIR}/include/mxnet)
 target_include_directories(pass_lib PUBLIC ${CMAKE_CURRENT_SOURCE_DIR}/include/mxnet)
 if(USE_CUDA)
-  add_library(customop_gpu_lib SHARED ${CMAKE_CURRENT_SOURCE_DIR}/example/extensions/lib_custom_op/relu_lib.cu)
+  add_library(customop_gpu_lib SHARED ${CMAKE_CURRENT_SOURCE_DIR}/example/extensions/lib_custom_op/relu_lib.cu ${CMAKE_CURRENT_SOURCE_DIR}/src/lib_api.cc)

Review comment:
       @ptrendx @DickJC123 can you guys help with this windows linking error when compiling with CUDA?
   ```
   [2020-08-18T07:32:25.378Z] cmd.exe /C "cd . && "C:\Program Files\CMake\bin\cmake.exe" -E vs_link_dll --intdir=CMakeFiles\customop_gpu_lib.dir --rc=C:\PROGRA~2\WI3CF2~1\10\bin\100162~1.0\x64\rc.exe --mt=C:\PROGRA~2\WI3CF2~1\10\bin\100162~1.0\x64\mt.exe --manifests  -- C:\PROGRA~2\MICROS~1\2019\COMMUN~1\VC\Tools\MSVC\1425~1.286\bin\Hostx64\x64\link.exe /nologo CMakeFiles\customop_gpu_lib.dir\example\extensions\lib_custom_op\relu_lib.cu.obj CMakeFiles\customop_gpu_lib.dir\src\lib_api.cc.obj  /out:libcustomop_gpu_lib.dll /implib:customop_gpu_lib.lib /pdb:libcustomop_gpu_lib.pdb /dll /version:0.0 /machine:x64  /INCREMENTAL:NO /OPT:REF /OPT:ICF -LIBPATH:C:\PROGRA~1\NVIDIA~2\CUDA\v10.2\lib\x64 cudadevrt.lib  cudart_static.lib  cudadevrt.lib  cudart_static.lib  kernel32.lib user32.lib gdi32.lib winspool.lib shell32.lib ole32.lib oleaut32.lib uuid.lib comdlg32.lib advapi32.lib  && cd ."
   [2020-08-18T07:32:25.378Z] LINK: command "C:\PROGRA~2\MICROS~1\2019\COMMUN~1\VC\Tools\MSVC\1425~1.286\bin\Hostx64\x64\link.exe /nologo CMakeFiles\customop_gpu_lib.dir\example\extensions\lib_custom_op\relu_lib.cu.obj CMakeFiles\customop_gpu_lib.dir\src\lib_api.cc.obj /out:libcustomop_gpu_lib.dll /implib:customop_gpu_lib.lib /pdb:libcustomop_gpu_lib.pdb /dll /version:0.0 /machine:x64 /INCREMENTAL:NO /OPT:REF /OPT:ICF -LIBPATH:C:\PROGRA~1\NVIDIA~2\CUDA\v10.2\lib\x64 cudadevrt.lib cudart_static.lib cudadevrt.lib cudart_static.lib kernel32.lib user32.lib gdi32.lib winspool.lib shell32.lib ole32.lib oleaut32.lib uuid.lib comdlg32.lib advapi32.lib /MANIFEST /MANIFESTFILE:libcustomop_gpu_lib.dll.manifest" failed (exit code 1120) with the following output:
   [2020-08-18T07:32:25.378Z] lib_api.cc.obj : error LNK2038: mismatch detected for 'RuntimeLibrary': value 'MT_StaticRelease' doesn't match value 'MD_DynamicRelease' in relu_lib.cu.obj
   ```
   https://jenkins.mxnet-ci.amazon-ml.com/blue/organizations/jenkins/mxnet-validation%2Fwindows-gpu/detail/PR-18904/17/pipeline
   
   The gist is:
   ```
   lib_api.cc.obj : error LNK2038: mismatch detected for 'RuntimeLibrary': 
   value 'MT_StaticRelease' doesn't match value 'MD_DynamicRelease' in relu_lib.cu.obj
   ```
   But I have no idea why one file would be compiled differently in the same set of sources here. The other non-GPU libraries (customop_lib, subgraph_lib, etc) seem to compile/link fine.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-mxnet] samskalicky commented on a change in pull request #18904: [RFC] MXNet external operators

Posted by GitBox <gi...@apache.org>.
samskalicky commented on a change in pull request #18904:
URL: https://github.com/apache/incubator-mxnet/pull/18904#discussion_r476705952



##########
File path: config/linux_gpu.cmake
##########
@@ -42,15 +43,18 @@ set(USE_CUDA ON CACHE BOOL "Build with CUDA support")
 set(USE_CUDNN ON CACHE BOOL "Build with cudnn support, if found")
 
 # Target NVIDIA GPU achitecture.
-# Valid options are "Auto" for autodetection, "All" for all available
-# architectures or a list of architectures by compute capability number, such as
-# "7.0" or "7.0;7.5" as well as name, such as "Volta" or "Volta;Turing".
+# Valid options are:
+#   - "Auto" for autodetection, will try and discover which GPU architecture to use by
+#            looking at the available GPUs on the machine that you're building on
+#   - "All" for all available GPU architectures supported by the version of CUDA installed
+#   - "specific GPU architectures" by giving the compute capability number such as
+#            "7.0" or "7.0;7.5" (ie. sm_70 or sm_75) or you can specify the name like:
+#            "Volta" or "Volta;Turing".

Review comment:
       @leezu I tried to merge my version which what was there before. Hows this?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-mxnet] samskalicky commented on pull request #18904: [RFC] MXNet external operators

Posted by GitBox <gi...@apache.org>.
samskalicky commented on pull request #18904:
URL: https://github.com/apache/incubator-mxnet/pull/18904#issuecomment-679581093


   @kpuatamazon I had a followup idea, I moved all the library specific build stuff into a **CMakeLists.txt** in the **example/extensions/lib_external_ops** directory. So now all of your library-specific build "stuff" is there and not in the main MXNet **CMakeLists.txt**. In this way, building a custom library does not require changing the main MXNet **CMakeLists.txt**, instead only the small/specific one for your custom library. 
   
   The main MXNet **CMakeLists.txt** just refers to the one in the **lib_external_ops** directory:
   https://github.com/apache/incubator-mxnet/pull/18904/files#diff-af3b638bc2a3e6c650974192a53c7291R709
   
   You still have to build MXNet, and then use MXNet's **CMakeLists.txt** to generate the Makefile for your custom library. But at least now the control is completely on the custom library. You can clone MXNet, build it, delete all the example files in **example/extensions/lib_external_ops**, drop in all of your files in the same directory, and build just the "external_lib" target. 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-mxnet] samskalicky commented on pull request #18904: [RFC] MXNet external operators

Posted by GitBox <gi...@apache.org>.
samskalicky commented on pull request #18904:
URL: https://github.com/apache/incubator-mxnet/pull/18904#issuecomment-677425208


   > @samskalicky we've had the situation before where the libmxnet was linked with gnu omp and libmkldnn was linked to intel omp (or might have been the other way around).
   
   Sounds vaguely familiar.....but what would we want the behavior to be in the case of "external components"? Are we saying that mx/lib must have the same omp? Or is there a way to have both coexist? Or try and detect a mismatch between mx/lib? 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-mxnet] ptrendx commented on a change in pull request #18904: [RFC] MXNet external operators

Posted by GitBox <gi...@apache.org>.
ptrendx commented on a change in pull request #18904:
URL: https://github.com/apache/incubator-mxnet/pull/18904#discussion_r472692516



##########
File path: CMakeLists.txt
##########
@@ -708,18 +708,18 @@ endif()
 target_compile_definitions(mxnet PUBLIC DMLC_LOG_FATAL_THROW=$<BOOL:${LOG_FATAL_THROW}>)
 
 # extension libraries (custom operators, custom subgraphs) are built by default
-add_library(customop_lib SHARED ${CMAKE_CURRENT_SOURCE_DIR}/example/extensions/lib_custom_op/gemm_lib.cc)
-add_library(transposecsr_lib SHARED ${CMAKE_CURRENT_SOURCE_DIR}/example/extensions/lib_custom_op/transposecsr_lib.cc)
-add_library(transposerowsp_lib SHARED ${CMAKE_CURRENT_SOURCE_DIR}/example/extensions/lib_custom_op/transposerowsp_lib.cc)
-add_library(subgraph_lib SHARED ${CMAKE_CURRENT_SOURCE_DIR}/example/extensions/lib_subgraph/subgraph_lib.cc)
-add_library(pass_lib SHARED ${CMAKE_CURRENT_SOURCE_DIR}/example/extensions/lib_pass/pass_lib.cc)
+add_library(customop_lib SHARED ${CMAKE_CURRENT_SOURCE_DIR}/example/extensions/lib_custom_op/gemm_lib.cc ${CMAKE_CURRENT_SOURCE_DIR}/src/lib_api.cc)
+add_library(transposecsr_lib SHARED ${CMAKE_CURRENT_SOURCE_DIR}/example/extensions/lib_custom_op/transposecsr_lib.cc ${CMAKE_CURRENT_SOURCE_DIR}/src/lib_api.cc)
+add_library(transposerowsp_lib SHARED ${CMAKE_CURRENT_SOURCE_DIR}/example/extensions/lib_custom_op/transposerowsp_lib.cc ${CMAKE_CURRENT_SOURCE_DIR}/src/lib_api.cc)
+add_library(subgraph_lib SHARED ${CMAKE_CURRENT_SOURCE_DIR}/example/extensions/lib_subgraph/subgraph_lib.cc ${CMAKE_CURRENT_SOURCE_DIR}/src/lib_api.cc)
+add_library(pass_lib SHARED ${CMAKE_CURRENT_SOURCE_DIR}/example/extensions/lib_pass/pass_lib.cc ${CMAKE_CURRENT_SOURCE_DIR}/src/lib_api.cc)
 target_include_directories(customop_lib PUBLIC ${CMAKE_CURRENT_SOURCE_DIR}/include/mxnet)
 target_include_directories(transposecsr_lib PUBLIC ${CMAKE_CURRENT_SOURCE_DIR}/include/mxnet)
 target_include_directories(transposerowsp_lib PUBLIC ${CMAKE_CURRENT_SOURCE_DIR}/include/mxnet)
 target_include_directories(subgraph_lib PUBLIC ${CMAKE_CURRENT_SOURCE_DIR}/include/mxnet)
 target_include_directories(pass_lib PUBLIC ${CMAKE_CURRENT_SOURCE_DIR}/include/mxnet)
 if(USE_CUDA)
-  add_library(customop_gpu_lib SHARED ${CMAKE_CURRENT_SOURCE_DIR}/example/extensions/lib_custom_op/relu_lib.cu)
+  add_library(customop_gpu_lib SHARED ${CMAKE_CURRENT_SOURCE_DIR}/example/extensions/lib_custom_op/relu_lib.cu ${CMAKE_CURRENT_SOURCE_DIR}/src/lib_api.cc)

Review comment:
       :shrug: Maybe? Compiling for Windows is black magic.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-mxnet] samskalicky commented on a change in pull request #18904: [RFC] MXNet external operators

Posted by GitBox <gi...@apache.org>.
samskalicky commented on a change in pull request #18904:
URL: https://github.com/apache/incubator-mxnet/pull/18904#discussion_r472667585



##########
File path: CMakeLists.txt
##########
@@ -708,18 +708,18 @@ endif()
 target_compile_definitions(mxnet PUBLIC DMLC_LOG_FATAL_THROW=$<BOOL:${LOG_FATAL_THROW}>)
 
 # extension libraries (custom operators, custom subgraphs) are built by default
-add_library(customop_lib SHARED ${CMAKE_CURRENT_SOURCE_DIR}/example/extensions/lib_custom_op/gemm_lib.cc)
-add_library(transposecsr_lib SHARED ${CMAKE_CURRENT_SOURCE_DIR}/example/extensions/lib_custom_op/transposecsr_lib.cc)
-add_library(transposerowsp_lib SHARED ${CMAKE_CURRENT_SOURCE_DIR}/example/extensions/lib_custom_op/transposerowsp_lib.cc)
-add_library(subgraph_lib SHARED ${CMAKE_CURRENT_SOURCE_DIR}/example/extensions/lib_subgraph/subgraph_lib.cc)
-add_library(pass_lib SHARED ${CMAKE_CURRENT_SOURCE_DIR}/example/extensions/lib_pass/pass_lib.cc)
+add_library(customop_lib SHARED ${CMAKE_CURRENT_SOURCE_DIR}/example/extensions/lib_custom_op/gemm_lib.cc ${CMAKE_CURRENT_SOURCE_DIR}/src/lib_api.cc)
+add_library(transposecsr_lib SHARED ${CMAKE_CURRENT_SOURCE_DIR}/example/extensions/lib_custom_op/transposecsr_lib.cc ${CMAKE_CURRENT_SOURCE_DIR}/src/lib_api.cc)
+add_library(transposerowsp_lib SHARED ${CMAKE_CURRENT_SOURCE_DIR}/example/extensions/lib_custom_op/transposerowsp_lib.cc ${CMAKE_CURRENT_SOURCE_DIR}/src/lib_api.cc)
+add_library(subgraph_lib SHARED ${CMAKE_CURRENT_SOURCE_DIR}/example/extensions/lib_subgraph/subgraph_lib.cc ${CMAKE_CURRENT_SOURCE_DIR}/src/lib_api.cc)
+add_library(pass_lib SHARED ${CMAKE_CURRENT_SOURCE_DIR}/example/extensions/lib_pass/pass_lib.cc ${CMAKE_CURRENT_SOURCE_DIR}/src/lib_api.cc)
 target_include_directories(customop_lib PUBLIC ${CMAKE_CURRENT_SOURCE_DIR}/include/mxnet)
 target_include_directories(transposecsr_lib PUBLIC ${CMAKE_CURRENT_SOURCE_DIR}/include/mxnet)
 target_include_directories(transposerowsp_lib PUBLIC ${CMAKE_CURRENT_SOURCE_DIR}/include/mxnet)
 target_include_directories(subgraph_lib PUBLIC ${CMAKE_CURRENT_SOURCE_DIR}/include/mxnet)
 target_include_directories(pass_lib PUBLIC ${CMAKE_CURRENT_SOURCE_DIR}/include/mxnet)
 if(USE_CUDA)
-  add_library(customop_gpu_lib SHARED ${CMAKE_CURRENT_SOURCE_DIR}/example/extensions/lib_custom_op/relu_lib.cu)
+  add_library(customop_gpu_lib SHARED ${CMAKE_CURRENT_SOURCE_DIR}/example/extensions/lib_custom_op/relu_lib.cu ${CMAKE_CURRENT_SOURCE_DIR}/src/lib_api.cc)

Review comment:
       Is this all I was missing?
   https://github.com/apache/incubator-mxnet/pull/18904/commits/04e88fb3c3f9daaa2bf7448b65678a20eff929be




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-mxnet] samskalicky commented on a change in pull request #18904: [RFC] MXNet external operators

Posted by GitBox <gi...@apache.org>.
samskalicky commented on a change in pull request #18904:
URL: https://github.com/apache/incubator-mxnet/pull/18904#discussion_r472659420



##########
File path: CMakeLists.txt
##########
@@ -708,18 +708,18 @@ endif()
 target_compile_definitions(mxnet PUBLIC DMLC_LOG_FATAL_THROW=$<BOOL:${LOG_FATAL_THROW}>)
 
 # extension libraries (custom operators, custom subgraphs) are built by default
-add_library(customop_lib SHARED ${CMAKE_CURRENT_SOURCE_DIR}/example/extensions/lib_custom_op/gemm_lib.cc)
-add_library(transposecsr_lib SHARED ${CMAKE_CURRENT_SOURCE_DIR}/example/extensions/lib_custom_op/transposecsr_lib.cc)
-add_library(transposerowsp_lib SHARED ${CMAKE_CURRENT_SOURCE_DIR}/example/extensions/lib_custom_op/transposerowsp_lib.cc)
-add_library(subgraph_lib SHARED ${CMAKE_CURRENT_SOURCE_DIR}/example/extensions/lib_subgraph/subgraph_lib.cc)
-add_library(pass_lib SHARED ${CMAKE_CURRENT_SOURCE_DIR}/example/extensions/lib_pass/pass_lib.cc)
+add_library(customop_lib SHARED ${CMAKE_CURRENT_SOURCE_DIR}/example/extensions/lib_custom_op/gemm_lib.cc ${CMAKE_CURRENT_SOURCE_DIR}/src/lib_api.cc)
+add_library(transposecsr_lib SHARED ${CMAKE_CURRENT_SOURCE_DIR}/example/extensions/lib_custom_op/transposecsr_lib.cc ${CMAKE_CURRENT_SOURCE_DIR}/src/lib_api.cc)
+add_library(transposerowsp_lib SHARED ${CMAKE_CURRENT_SOURCE_DIR}/example/extensions/lib_custom_op/transposerowsp_lib.cc ${CMAKE_CURRENT_SOURCE_DIR}/src/lib_api.cc)
+add_library(subgraph_lib SHARED ${CMAKE_CURRENT_SOURCE_DIR}/example/extensions/lib_subgraph/subgraph_lib.cc ${CMAKE_CURRENT_SOURCE_DIR}/src/lib_api.cc)
+add_library(pass_lib SHARED ${CMAKE_CURRENT_SOURCE_DIR}/example/extensions/lib_pass/pass_lib.cc ${CMAKE_CURRENT_SOURCE_DIR}/src/lib_api.cc)
 target_include_directories(customop_lib PUBLIC ${CMAKE_CURRENT_SOURCE_DIR}/include/mxnet)
 target_include_directories(transposecsr_lib PUBLIC ${CMAKE_CURRENT_SOURCE_DIR}/include/mxnet)
 target_include_directories(transposerowsp_lib PUBLIC ${CMAKE_CURRENT_SOURCE_DIR}/include/mxnet)
 target_include_directories(subgraph_lib PUBLIC ${CMAKE_CURRENT_SOURCE_DIR}/include/mxnet)
 target_include_directories(pass_lib PUBLIC ${CMAKE_CURRENT_SOURCE_DIR}/include/mxnet)
 if(USE_CUDA)
-  add_library(customop_gpu_lib SHARED ${CMAKE_CURRENT_SOURCE_DIR}/example/extensions/lib_custom_op/relu_lib.cu)
+  add_library(customop_gpu_lib SHARED ${CMAKE_CURRENT_SOURCE_DIR}/example/extensions/lib_custom_op/relu_lib.cu ${CMAKE_CURRENT_SOURCE_DIR}/src/lib_api.cc)

Review comment:
       @ptrendx @DickJC123 can you guys help with this windows linking error when compiling with CUDA?
   ```
   [2020-08-18T07:32:25.378Z] cmd.exe /C "cd . && "C:\Program Files\CMake\bin\cmake.exe" -E vs_link_dll --intdir=CMakeFiles\customop_gpu_lib.dir --rc=C:\PROGRA~2\WI3CF2~1\10\bin\100162~1.0\x64\rc.exe --mt=C:\PROGRA~2\WI3CF2~1\10\bin\100162~1.0\x64\mt.exe --manifests  -- C:\PROGRA~2\MICROS~1\2019\COMMUN~1\VC\Tools\MSVC\1425~1.286\bin\Hostx64\x64\link.exe /nologo CMakeFiles\customop_gpu_lib.dir\example\extensions\lib_custom_op\relu_lib.cu.obj CMakeFiles\customop_gpu_lib.dir\src\lib_api.cc.obj  /out:libcustomop_gpu_lib.dll /implib:customop_gpu_lib.lib /pdb:libcustomop_gpu_lib.pdb /dll /version:0.0 /machine:x64  /INCREMENTAL:NO /OPT:REF /OPT:ICF -LIBPATH:C:\PROGRA~1\NVIDIA~2\CUDA\v10.2\lib\x64 cudadevrt.lib  cudart_static.lib  cudadevrt.lib  cudart_static.lib  kernel32.lib user32.lib gdi32.lib winspool.lib shell32.lib ole32.lib oleaut32.lib uuid.lib comdlg32.lib advapi32.lib  && cd ."
   [2020-08-18T07:32:25.378Z] LINK: command "C:\PROGRA~2\MICROS~1\2019\COMMUN~1\VC\Tools\MSVC\1425~1.286\bin\Hostx64\x64\link.exe /nologo CMakeFiles\customop_gpu_lib.dir\example\extensions\lib_custom_op\relu_lib.cu.obj CMakeFiles\customop_gpu_lib.dir\src\lib_api.cc.obj /out:libcustomop_gpu_lib.dll /implib:customop_gpu_lib.lib /pdb:libcustomop_gpu_lib.pdb /dll /version:0.0 /machine:x64 /INCREMENTAL:NO /OPT:REF /OPT:ICF -LIBPATH:C:\PROGRA~1\NVIDIA~2\CUDA\v10.2\lib\x64 cudadevrt.lib cudart_static.lib cudadevrt.lib cudart_static.lib kernel32.lib user32.lib gdi32.lib winspool.lib shell32.lib ole32.lib oleaut32.lib uuid.lib comdlg32.lib advapi32.lib /MANIFEST /MANIFESTFILE:libcustomop_gpu_lib.dll.manifest" failed (exit code 1120) with the following output:
   [2020-08-18T07:32:25.378Z] lib_api.cc.obj : error LNK2038: mismatch detected for 'RuntimeLibrary': value 'MT_StaticRelease' doesn't match value 'MD_DynamicRelease' in relu_lib.cu.obj
   ```




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-mxnet] samskalicky commented on pull request #18904: [RFC] MXNet external operators

Posted by GitBox <gi...@apache.org>.
samskalicky commented on pull request #18904:
URL: https://github.com/apache/incubator-mxnet/pull/18904#issuecomment-674991226


   > Since mxnet 2.0 generally aims to simplify the build system, shouldn't we take this case as incentive to simplify it further and remove the complexity that is currently blocking you on this matter? Maybe @pengzhao-intel and his team as well as @ptrendx and his team could help here.
   > 
   > The complexity of our build system and dependency.closure are creating various issues on different topics, so finding a proper solution that goes above "#if" and compile time.options would he helpful for the project.
   
   Thats a good point, and would be awesome to make this easier for users. I would prefer not to block this feature on that reorganization though. Instead I would propose moving forward with the PR by building external ops as suggested above. When the build system simplifies we can simplify building external ops. 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-mxnet] samskalicky commented on pull request #18904: [RFC] MXNet external operators

Posted by GitBox <gi...@apache.org>.
samskalicky commented on pull request #18904:
URL: https://github.com/apache/incubator-mxnet/pull/18904#issuecomment-675647299


   > Regarding the open question "What symbols do we need to make available in libmxnet.so for external operators that might be stripped out?", note that https://github.com/apache/incubator-mxnet/blob/master/cmake/libmxnet.sym will hide all non-whitelisted symbols and @Zha0q1 intends to enable it for all builds to avoid name clashing with ILP64 BLAS.
   
   Makes sense, probably the best way to merge these two features is to just explicitly exclude the symbols causing the clashing with ILP64 BLAS. The inverse of deciding which symbols in mxnet to strip out while still leaving the ones users intend to use for external ops might be more difficult to do and maintain.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-mxnet] samskalicky edited a comment on pull request #18904: [RFC] MXNet external operators

Posted by GitBox <gi...@apache.org>.
samskalicky edited a comment on pull request #18904:
URL: https://github.com/apache/incubator-mxnet/pull/18904#issuecomment-674967821


   > I think this would be much cleaner if it was a separate directory because:
   > 
   > * Version control is much easier.  Just update mxnet or `rm -rf` it without extra cruft
   > * This reflects reality much better.  Somebody else builds mxnet for `pip` without knowing about my stuff. MXNet ships a docker in which I can build my thing for binary compatibility.
   > * MXNet should be usable as a submodule.
   > 
   > So my ideal instructions look more like
   > 
   > 1. compile MXNet normally from a clean checkout
   > 2. cd into your own project, configure with `-DMXNET=/path/to/mxnet` and compile.  Sample project provided and part of integration tests.
   > 3. `my_op.so` built by my build system
   > 4. dynamically load shared library into MXNet via `mx.library.load()`
   
   Agreed, if only MXNet codebase was better organized. We have headers/includes scattered throughout the codebase. Not just in **include/mxnet**. For example `mxnet_op.h` in in **src/operator** and [`include/mshadow/base.h` includes `mkl_blas.h`](https://github.com/apache/incubator-mxnet/blob/2610c10701c2b8155dbf094aaecba37ebbf67d0f/3rdparty/mshadow/mshadow/base.h#L173) which isnt in the MXNet codebase. Duplicating and reproducing the MXNet cmake flow for building custom operators is not worth the hassle/maintenance. 
   
   If others have ideas that they wanna give it a whirl, feel free to checkout this branch and try building `min_ex.cc` in the `lib_external_ops` directory. Happy to collaborate on this. 
   
   I added a `test` target in the Makefile in `lib_external_ops` to try and compile/link as you suggest:
   https://github.com/apache/incubator-mxnet/pull/18904/files#diff-8a8d486c6b362b11bec05cdd67b3c3bdR32-R33
   But currently its failing with:
   ```
   In file included from ../../../include/mshadow/tensor.h:35:0,
                    from ../../../include/mxnet/base.h:33,
                    from ../../../src/operator/mxnet_op.h:30,
                    from min_ex-inl.h:26,
                    from min_ex.cc:20:
   ../../../include/mshadow/./base.h:173:12: fatal error: mkl_blas.h: No such file or directory
      #include <mkl_blas.h>
               ^~~~~~~~~~~~
   ```
   Im happy to be wrong, but I think this is an indication of more to come in terms of managing a complex set of includes/dependencies that will constantly be changing. And we havent even gotten into the cmake options mapping to defines/compileOptions/etc


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-mxnet] marcoabreu commented on pull request #18904: [RFC] MXNet external operators

Posted by GitBox <gi...@apache.org>.
marcoabreu commented on pull request #18904:
URL: https://github.com/apache/incubator-mxnet/pull/18904#issuecomment-679720674


   How about components in general? Let's say we would like to externalize the onnx support or some other component? We would still always have to build these things together and specify the features set during compile time, right?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-mxnet] marcoabreu commented on pull request #18904: [RFC] MXNet external operators

Posted by GitBox <gi...@apache.org>.
marcoabreu commented on pull request #18904:
URL: https://github.com/apache/incubator-mxnet/pull/18904#issuecomment-676178257


   Love the idea! This could open the path towards a modular architecture where additional (optional) components can be built and distributed separately - this would allow us to reduce complexity and we can also get rid of some licensing issues since certain components can be maintained by third parties. 
   
   For example, the TensorRT or TVM integration would no longer be a separate build flag but an external component that would be loaded if it is in the appropriate folder - from the operators over all the other methods that are needed to integrate these libraries as well as things that we might not want in the product but the component creator would still like to have. 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-mxnet] samskalicky commented on a change in pull request #18904: [RFC] MXNet external operators

Posted by GitBox <gi...@apache.org>.
samskalicky commented on a change in pull request #18904:
URL: https://github.com/apache/incubator-mxnet/pull/18904#discussion_r476705576



##########
File path: config/linux_gpu.cmake
##########
@@ -24,7 +24,8 @@
 #
 #  $ cp config/linux_gpu.cmake config.cmake
 #
-#  Next modify the according entries, and then compile by
+#  Next modify the entries in the config.cmake like MXNET_CUDA_ARCH to set the specific
+#  GPU architecture, and then compile by

Review comment:
       Thanks @leezu for that feedback, how about this where we point out that users might want to set the MXNET_CUDA_ARCH when using the linux_gpu.cmake file. And then let them refer below for the specific details. This at least points out that they might need to do something with MXNET_CUDA_ARCH in order to build for GPU




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-mxnet] leezu commented on pull request #18904: [RFC] MXNet external operators

Posted by GitBox <gi...@apache.org>.
leezu commented on pull request #18904:
URL: https://github.com/apache/incubator-mxnet/pull/18904#issuecomment-675629514


   Regarding the open question "What symbols do we need to make available in libmxnet.so for external operators that might be stripped out?", note that https://github.com/apache/incubator-mxnet/blob/master/cmake/libmxnet.sym will hide all non-whitelisted symbols and @Zha0q1 intends to enable it for all builds to avoid name clashing with ILP64 BLAS.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-mxnet] samskalicky commented on pull request #18904: [RFC] MXNet external operators

Posted by GitBox <gi...@apache.org>.
samskalicky commented on pull request #18904:
URL: https://github.com/apache/incubator-mxnet/pull/18904#issuecomment-675646008


   todo: https://github.com/apache/incubator-mxnet/pull/18894#pullrequestreview-469709889


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-mxnet] samskalicky commented on pull request #18904: [RFC] MXNet external operators

Posted by GitBox <gi...@apache.org>.
samskalicky commented on pull request #18904:
URL: https://github.com/apache/incubator-mxnet/pull/18904#issuecomment-680282254


   > I think we should expose the MX/NN symbols, unless there's a downside to it.
   
   The complication is how we go about doing the exporting. @yajiedesign maybe can add some more details, but a cursory search shows that you can use a cmake property WINDOWS_EXPORT_ALL_SYMBOLS to export everything and make it in line with how gcc works. But this will bloat the already bloated windows dlls. The other way to do it would be to go to the source files and add `__declspec(dllexport)` to only the symbol declarations we want to export. This is probably not feasible since it would mean we need to modify the 3rd party submodules as well. Plus, we'd need to do a bunch of work to figure out the set of symbols required manually. 
   
   So really the only option is to use a cmake property WINDOWS_EXPORT_ALL_SYMBOLS and accept the bloat. Unless we drop windows and just say that this "external components" feature is linux only (C++ Custom operators can still be used on windows). 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-mxnet] kpuatamazon commented on pull request #18904: [RFC] MXNet external operators

Posted by GitBox <gi...@apache.org>.
kpuatamazon commented on pull request #18904:
URL: https://github.com/apache/incubator-mxnet/pull/18904#issuecomment-674993612


   What about cmake subprojects?  As in use MXNet as a subproject of the external library.  


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-mxnet] kpuatamazon edited a comment on pull request #18904: [RFC] MXNet external operators

Posted by GitBox <gi...@apache.org>.
kpuatamazon edited a comment on pull request #18904:
URL: https://github.com/apache/incubator-mxnet/pull/18904#issuecomment-674961787


   I think this would be much cleaner if it was a separate directory because:
   
   - Version control is much easier.  Just update mxnet or `rm -rf` it without extra cruft
   - This reflects reality much better.  Somebody else builds mxnet for `pip` without knowing about my stuff. MXNet ships a docker in which I can build my thing for binary compatibility.  (I think half of this already exists for running tests.)
   - MXNet should be usable as a submodule.  
   
   So my ideal instructions look more like
   
   1. compile MXNet normally from a clean checkout
   2. cd into your own project, configure with `-DMXNET=/path/to/mxnet` and compile.  Sample project provided and part of integration tests.  
   3. `my_op.so` built by my build system
   4. dynamically load shared library into MXNet via `mx.library.load()`


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-mxnet] szha commented on pull request #18904: [WIP] MXNet external operators

Posted by GitBox <gi...@apache.org>.
szha commented on pull request #18904:
URL: https://github.com/apache/incubator-mxnet/pull/18904#issuecomment-674550005


   @samskalicky thanks for starting on this! does it supersede the custom c++ ops?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-mxnet] samskalicky commented on pull request #18904: [RFC] MXNet external operators

Posted by GitBox <gi...@apache.org>.
samskalicky commented on pull request #18904:
URL: https://github.com/apache/incubator-mxnet/pull/18904#issuecomment-674591733


   Hi MXNet community,
   
   I would like to propose another feature in MXNet Extensions for "external ops". Please check out the detailed summary [1].
   
   Thanks!
   Sam
   
   
   [1] https://github.com/apache/incubator-mxnet/pull/18904


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-mxnet] leezu commented on a change in pull request #18904: [RFC] MXNet external operators

Posted by GitBox <gi...@apache.org>.
leezu commented on a change in pull request #18904:
URL: https://github.com/apache/incubator-mxnet/pull/18904#discussion_r476588681



##########
File path: config/linux_gpu.cmake
##########
@@ -30,6 +30,10 @@
 #  $ cmake ..
 #  $ cmake --build .
 #
+#  or you can specify the particular GPU architecture by
+#
+#  $ cmake .. -DMXNET_CUDA_ARCH=7.0

Review comment:
       @samskalicky I'm totally for highlighting the `MXNET_CUDA_ARCH` option at the top of this file. My concern is more general, in that a few months ago we started of with asking users to specify options via the `-D` arguments to `cmake ..`, but reviewers pushed for using a config file. Now you documented inside the config file how to use the `-D` option.
   
   Thus let's be consistent and highlight the `MXNET_CUDA_ARCH` option, while recommending users to edit the value of `MXNET_CUDA_ARCH` 20 lines below. I'm afraid there may be confusion if we ask inside the config file that users should use the `-D` option to overwrite the config file. WDYT?
   
   We may also add a README file in the `config/` directory and explain that all optiions can be overwritten by passing a new value via `-D` to `cmake`.
   
   Finally, your initial explanation for the `-DMXNET_CUDA_ARCH` option was easy to understand, so it would be great to use it to improve the existing documentation.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-mxnet] marcoabreu commented on pull request #18904: [RFC] MXNet external operators

Posted by GitBox <gi...@apache.org>.
marcoabreu commented on pull request #18904:
URL: https://github.com/apache/incubator-mxnet/pull/18904#issuecomment-674987945


   Since mxnet 2.0 generally aims to simplify the build system, shouldn't we take this case as incentive to simplify it further and remove the complexity that is currently blocking you on this matter? Maybe @pengzhao-intel and his team as well as @ptrendx and his team could help here. 
   
   The complexity of our build system and dependency.closure are creating various issues on different topics, so finding a proper solution that goes above "#if" and compile time.options would he helpful for the project.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-mxnet] samskalicky edited a comment on pull request #18904: [RFC] MXNet external operators

Posted by GitBox <gi...@apache.org>.
samskalicky edited a comment on pull request #18904:
URL: https://github.com/apache/incubator-mxnet/pull/18904#issuecomment-674967821


   > I think this would be much cleaner if it was a separate directory because:
   > 
   > * Version control is much easier.  Just update mxnet or `rm -rf` it without extra cruft
   > * This reflects reality much better.  Somebody else builds mxnet for `pip` without knowing about my stuff. MXNet ships a docker in which I can build my thing for binary compatibility.
   > * MXNet should be usable as a submodule.
   > 
   > So my ideal instructions look more like
   > 
   > 1. compile MXNet normally from a clean checkout
   > 2. cd into your own project, configure with `-DMXNET=/path/to/mxnet` and compile.  Sample project provided and part of integration tests.
   > 3. `my_op.so` built by my build system
   > 4. dynamically load shared library into MXNet via `mx.library.load()`
   
   Agreed, if only MXNet codebase was better organized. We have headers/includes scattered throughout the codebase. Not just in **include/mxnet**. For example `mxnet_op.h` in in **src/operator** and [`include/mshadow/base.h` includes `mkl_blas.h`](https://github.com/apache/incubator-mxnet/blob/2610c10701c2b8155dbf094aaecba37ebbf67d0f/3rdparty/mshadow/mshadow/base.h#L173) which isnt in the MXNet codebase. Duplicating and reproducing the MXNet cmake flow for building custom operators is not worth the hassle/maintenance. 
   
   If others have ideas that they wanna give it a whirl, feel free to checkout this branch and try building `min_ex.cc` in the `lib_external_ops` directory. Happy to collaborate on this. 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org